minimaxir / facebook-page-post-scraper

Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis
2.12k stars 663 forks source link

TypeError: the JSON object must be str, not 'bytes' #51

Open paladini opened 7 years ago

paladini commented 7 years ago

I have this issue using comment scraper for public pages. I've filled all variables correctly (app_id, app_secret and page id), have run the post scraper before and it finished successfully.

Following you can see the full error log:

$ python3 get_fb_comments_from_fb.py
Scraping <OMMITED> Comments From Posts: 2017-05-21 15:51:37.768667

Traceback (most recent call last):
  File "get_fb_comments_from_fb.py", line 220, in <module>
    scrapeFacebookPageFeedComments(file_id, access_token)
  File "get_fb_comments_from_fb.py", line 147, in scrapeFacebookPageFeedComments
    comments = json.loads(request_until_succeed(url))
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

The page I'm scraping has posts and comments written in Brazilian Portuguese (PT-BR).

paladini commented 7 years ago

If anyone is having the same issue, I've found how to fix that! Just change the following code from the comments scraper:

def request_until_succeed(url):
    req = Request(url)
    success = False
    while success is False:
        try:
            response = urlopen(req)
            if response.getcode() == 200:
                success = True
        except Exception as e:
            print(e)
            time.sleep(5)

            print("Error for URL {}: {}".format(url, datetime.datetime.now()))
            print("Retrying.")

    return response.read()

To this one (i've added .decode('utf-8') before returning the value):

    req = Request(url)
    success = False
    while success is False:
        try:
            response = urlopen(req)
            if response.getcode() == 200:
                success = True
        except Exception as e:
            print(e)
            time.sleep(5)

            print("Error for URL {}: {}".format(url, datetime.datetime.now()))
            print("Retrying.")

    return response.read().decode('utf-8')

Now it's working fine here, but don't know if it's reliable for everyone, so I'm not going to submit a pull request with this fix.

minimaxir commented 7 years ago

The script does encoding/decoding shenanigans in order to be compatible with both Python 2 and 3. I will have to check if that solution will work for Python 2.

paladini commented 7 years ago

Thanks for the fast reply, @minimaxir !

Mika15 commented 7 years ago

Guys, again I have an issue with paging. Cannot figure out why it is happening. Can you help me? Thanks! `--------------------------------------------------------------------------- AttributeError Traceback (most recent call last)

in () 176 177 if __name__ == '__main__': --> 178 scrapeFacebookPageFeedStatus(group_id, access_token) in scrapeFacebookPageFeedStatus(group_id, access_token) 160 if 'paging' in statuses: 161 next_url = statuses['paging']['next'] --> 162 until = re.search('until=([0-9]*?)(&|$)', next_url).group(1) 163 if until is None: 164 return None AttributeError: 'NoneType' object has no attribute 'group'`
nxy commented 6 years ago

@paladini thanks worked for me