Get comments stopped working (master branch from 5th of Nov)

kevinzg / facebook-scraper

Scrape Facebook public pages without an API key

MIT License

2.43k stars 631 forks source link

Get comments stopped working (master branch from 5th of Nov) #546

Open ekote opened 3 years ago

ekote commented 3 years ago

Hi,

I want to download comments for some posts (either by post_id or post_url). I was doing:

next(get_posts(post_urls=[post["post_id"]],
                                  cookies=COOCKIE_FIILE,
                                  options={"comments": True}
))

but that stopped working. Now I installed the library from a new master branch and am getting the error:

  File "/mnt/c/dev/fb-crowler/venv/lib/python3.6/site-packages/facebook_scraper/facebook_scraper.py", line 92, in get_posts_by_url
    video_id = parse_qs(urlparse(response.url).query).get("v")[0]
TypeError: 'NoneType' object is not subscriptable

@neon-ninja - has something changed or I am doing something wrong? Also, what is the most efficient way to download in the same time post with all the comments and all the reactors?

Thanks!

ekote commented 3 years ago

I've added '"allow_extra_requests": True' and now it's working (starting) but the error is the same as above.

neon-ninja commented 3 years ago

What post id are you having this problem with?

ekote commented 3 years ago

@neon-ninja 605668170584371

neon-ninja commented 3 years ago

This is a video post. If you're logged in, https://m.facebook.com/605668170584371 redirects to https://m.facebook.com/watch/?ref=watch_permalink, which is why the post can't be extracted. This looks like a bug in Facebook. This post is accessible via the post url (https://m.facebook.com/story.php?story_fbid=605667013917820&id=100064352786833&m_entstream_source=timeline), this commit (https://github.com/kevinzg/facebook-scraper/commit/b90c67a48115821e186da73494900f9c19700e0f) should fix it.

eolocon commented 2 years ago

This is a video post. If you're logged in, https://m.facebook.com/605668170584371 redirects to https://m.facebook.com/watch/?ref=watch_permalink, which is why the post can't be extracted. This looks like a bug in Facebook. This post is accessible via the post url (https://m.facebook.com/story.php?story_fbid=605667013917820&id=100064352786833&m_entstream_source=timeline), this commit (b90c67a) should fix it.

Hey There! I'm writing to let you know that the fix is not working. Everytime I try to scrape a post with a video (for istance, this one: https://facebook.com/enricoletta.it/videos/1064033957754259), this error pop up (the very same from before!):

line 101, in get_posts_by_url video_id = parse_qs(urlparse(response.url).query).get("v")[0] TypeError: 'NoneType' object is not subscriptable

I'm using the master branch. If you need other details (in case you want to fix this), just ask!

neon-ninja commented 2 years ago

@eolocon try passing the scraper just the video id. Like so: get_posts(post_urls=[1064033957754259])

eolocon commented 2 years ago

@eolocon try passing the scraper just the video id. Like so: get_posts(post_urls=[1064033957754259])

Worked like a charm! As a general rule, is it better to scrape using ids instead of a full link?

neon-ninja commented 2 years ago

https://github.com/kevinzg/facebook-scraper/commit/436a99d6eba13c462c09f321e3032ce21deb5396 should make the scraper try extract the ID from URLs like this. So pprint(next(get_posts(post_urls=["https://facebook.com/enricoletta.it/videos/1064033957754259"]))) should then work too.