kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.44k stars 632 forks source link

Unable to get all comments from posts #471

Open malcolm1232 opened 3 years ago

malcolm1232 commented 3 years ago

Hi! Good Morning from Singapore! Just would like to clarify why i am unable to get all the comments for a particular post?

In the post: https://www.facebook.com/ChannelNewsAsia/posts/10158518069332934 As of writing, there are ~250 comments. but when i scrape, using the following code, i can only retrieve the top 3 results. Prior last night was the top 8 comments.

My code: actual_url_gotGitHub = 'https://www.facebook.com/ChannelNewsAsia/posts/10158518069332934' url = 'ChannelNewsAsia/posts/10158518069332934' for post in get_posts(url, pages=2, options={"comments": True, "reactors": True, "progress": True}, extra_info = True, cookies = 'mal_cookie.txt' ):

print('Actual num of comments:', post['comments'])
print('Returned num of comments:', len(post['comments_full']))

print(post)

neon-ninja commented 3 years ago

Hi! Good afternoon from New Zealand!

If I run your code, with one extra addition (print('Comments + replies:', len(post['comments_full']) + sum(len(c["replies"]) for c in post["comments_full"]))), I get the following:

Actual num of comments: 254
Returned num of comments: 93
Comments + replies: 229

So the comments field is actually the number of comments and replies, whereas len(post["comments_full"]) is just the number of top level comments.

Do you get any locale warnings?

malcolm1232 commented 3 years ago

hiee good afternoon! The number of my returns are less!

So i figured maybe its the number of cookies...?

WITH COOKIE: Actual num of comments: 254 Returned num of comments: 3 Comments + replies: 19

NO COOKIE: Actual num of comments: 254 Returned num of comments: 3 Comments + replies: 19

#######Locale Warnings No, i do not get aany locale warnings! (Jupyter and ipynb) (For this one post)

BUT I get a locale warning if i run for "ChannelNewsAsia" for post in get_posts("ChannelNewsAsia", c:\users\malco\appdata\local\programs\python\python36\lib\site-packages\IPython\core\interactiveshell.py:3325: UserWarning: A low page limit (<=2) might return no results, try increasing the limit exec(code_obj, self.user_global_ns, self.user_ns)

######## Other Findings (Reactors : True) Other things i found out is that If i remove "reactors": True, i get Actual num of comments: 254 Returned num of comments: 91 Comments + replies: 227

If i Include "reactors": True, i get Actual num of comments: 254 Returned num of comments: 3 Comments + replies: 19

neon-ninja commented 3 years ago

Ah, so if reactors is the issue, I wonder if you're running into https://github.com/kevinzg/facebook-scraper/issues/441. Try update to latest master

malcolm1232 commented 3 years ago

Hi! Good afternoon from New Zealand!

If I run your code, with one extra addition (print('Comments + replies:', len(post['comments_full']) + sum(len(c["replies"]) for c in post["comments_full"]))), I get the following:

Actual num of comments: 254
Returned num of comments: 93
Comments + replies: 229

So the comments field is actually the number of comments and replies, whereas len(post["comments_full"]) is just the number of top level comments.

Do you get any locale warnings?

I've Updated to master. Hmm..., for some reason in Pycharm it returns Actual num of comments: 254 Returned num of comments: 93

But for IPYNB it returns Actual num of comments: 254 Returned num of comments: 3

Also, with respect to your* run : Actual num of comments: 254 Returned num of comments: 93 Comments + replies: 229

Why do i not achieve the same number of comments; 229 vs 254? Your reply was So the comments field is actually the number of comments and replies, whereas len(post["comments_full"]) is just the number of top level comments. Because when i see the facebook post.., there are 254 comments. But why do i get 229 when i scrape them? Sorry for the trouble again.., T.T

Also, when running on pycharm i get UserWarning: Locale detected as en_GB - for best results, set to en_US warnings.warn(f"Locale detected as {locale} - for best results, set to en_US")

neon-ninja commented 3 years ago

Restart your jupyter kernel after updating the library.

Not all comments can be extracted - some are suppressed for being spam.

en_GB should work fine too

jkapali commented 2 months ago

i am having issues with scraping comments on a Facebook Group's post. It's been working fine for a while on a post that was made in the past, but trying the scraper on a new post shows limitations. The scraper only gets about 40 comments when there are 170+. the post in the past has 380+ comments and the scraper was able to retrieve all of it. i follow the given example exactly with the exemption of adding the cookies parameter to the getposts. i updated facebook-scraper and im not sure if some comments are suppressed for being spam.