kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.45k stars 633 forks source link

Error While Scraping Comments: json.decoder.JSONDecodeError #1097

Open mscarl opened 7 months ago

mscarl commented 7 months ago

Hi, I'm relatively new to python and I've been trying to scrape some Facebook comments using your code. When scraping from a couple of posts I've gotten the following error:

json.decoder.JSONDecodeError: Extra data: line 1 column 31109 (char 31108) Traceback (most recent call last): File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\facebook_scraper\utils.py", line 279, in safe_consume for item in generator: File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\facebook_scraper\extractors.py", line 1139, in extract_comment_replies data = json.loads(response.text[prefix_length:]) # Strip 'for (;;);' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64qbz5n2kfra8p0\Lib\json\init__.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^

Here's the code I'm using to scrape comments:

from pprint import pprint
from facebook_scraper import *
import logging
import os
import json
import pandas as pd
from tqdm import tqdm
import time

post_ids = ["https://www.facebook.com/EurovisionSongContest/posts/pfbid02uj7TGbujB8KrytL9H1qnTqYGV2xFnDnLzvN4ntGXopcusdzcawPzwT78NUziMQoql"]
cookies = "www.facebook.com_cookies.txt"
set_cookies(cookies)

options = {"comments": True, "progress": True, "allow_extra_requests": True}

def format_comment(c):
    obj = {
        "comment_id": c["comment_id"],
        "comment_text": c["comment_text"]
    }
    return obj

fb_comments = []
post = next(get_posts(post_urls=post_ids, options=options))
for comment in post["comments_full"]:
    fb_comments.append(format_comment(comment))
    for reply in comment["replies"]:
        fb_comments.append(format_comment(reply))
pd.DataFrame(fb_comments).to_csv("Winner_2022.csv", index=False)

Any help would be greatly appreciated.

conventoangelo commented 2 months ago

Just had this error as well. I guess the error happens when you've scraped too much, and it blocks your IP address, possibly sending you a JSON return error message longer than what is expected. I haven't tried to print the JSON in the terminal yet to see if that's true. I'm just inferring from the decoder.py. What worked for me is just simply connecting to a different country, as a different server in my VPN does not work. Hope this helps!