kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.45k stars 633 forks source link

I don't know what is this issue #1040

Open NguyenDrasp opened 1 year ago

NguyenDrasp commented 1 year ago

in extract_comment_replies data = json.loads(response.text[prefix_length:]) # Strip 'for (;;);' File "/usr/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/usr/lib/python3.10/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 1 column 30442 (char 30441)

Rickaym commented 1 year ago

This issue comes from LN1139 of facebook_scraper/extractors.py that loads the response text directly into JSON, but as it happens, sometimes there are two JSON objects inside the response without it being wrapped in an array.

I made a hot fix for the issue by restructuring the json string to be wrapped in an array in cases where there are multiple json objects in an invalid format.

Line 1138
            json_str = response.text[prefix_length:].strip()  # Strip 'for (;;);'

            if "}{" in json_str:
                # multiple json objs can come without being wrapped in an array
                json_str = f"[{json_str.replace('}{', '},{')}]"

            data = json.loads(json_str)

            if isinstance(data, list):
                for i, subdata in enumerate(data):
                    if i == 0:
                        continue
                    data[0]['payload']['actions'].extend(subdata['payload']['actions'])
                data = data[0]
Line 1159

i.e.

image

It would be helpful if you rename your issue as "JSONDecodeError "extra data" in extract_comment_replies" @NguyenDrasp