minimaxir / facebook-page-post-scraper

Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis
2.12k stars 663 forks source link

Error in line 184 and 154 #101

Open mparackal opened 6 years ago

mparackal commented 6 years ago

I am not a programmer but can find my way through codes like a blind man. So, can you help me, please? I get the following error, not sure what I am doing wrong. Also, the current Facebook API version is 2.12. This code is point to version 2.9 (base = "https://graph.facebook.com/v2.9" # line 139).

Thank you & Kind regard

Mathew

Traceback (most recent call last): File "C:\Users\parma73p\Desktop\Python\get_fb_posts_fb_page.py", line 184, in scrapeFacebookPageFeedStatus(page_id, access_token, since_date, until_date) File "C:\Users\parma73p\Desktop\Python\get_fb_posts_fb_page.py", line 154, in scrapeFacebookPageFeedStatus statuses = json.loads(request_until_succeed(url)) File "C:\Program Files\IBM\SPSS\Statistics\24\Python3\lib\json__init.py", line 312, in loads s.class.name__)) TypeError: the JSON object must be str, not 'bytes'

semmet95 commented 6 years ago

Hey there. Can you share the URL you're trying to scrape data from?

mparackal commented 6 years ago

thanks singh-95.

The URL is https://www.facebook.com/NZMSCAlpine/

semmet95 commented 6 years ago

@mparackal the script is successfully scraping data from the page you shared when I test it. Can you please share your code?

trevdog94 commented 6 years ago

I altered the request_until_succeed to look like this and it resolved the issue for me:

def request_until_succeed(url):
    req = Request(url)
    success = False
    while success is False:
        try:
            response = urlopen(req)
            **res = response.read()
            res = res.decode("utf-8")**
            if response.getcode() == 200:
                success = True
        except Exception as e:
            print(e)
            time.sleep(5)

            print("Error for URL {}: {}".format(url, datetime.datetime.now()))
            print("Retrying.")

    return res