Open malcolm1232 opened 3 years ago
Hi! Good afternoon from New Zealand!
If I run your code, with one extra addition (print('Comments + replies:', len(post['comments_full']) + sum(len(c["replies"]) for c in post["comments_full"]))
), I get the following:
Actual num of comments: 254
Returned num of comments: 93
Comments + replies: 229
So the comments
field is actually the number of comments and replies, whereas len(post["comments_full"])
is just the number of top level comments.
Do you get any locale warnings?
hiee good afternoon! The number of my returns are less!
So i figured maybe its the number of cookies...?
WITH COOKIE: Actual num of comments: 254 Returned num of comments: 3 Comments + replies: 19
NO COOKIE: Actual num of comments: 254 Returned num of comments: 3 Comments + replies: 19
#######Locale Warnings No, i do not get aany locale warnings! (Jupyter and ipynb) (For this one post)
BUT I get a locale warning if i run for "ChannelNewsAsia" for post in get_posts("ChannelNewsAsia", c:\users\malco\appdata\local\programs\python\python36\lib\site-packages\IPython\core\interactiveshell.py:3325: UserWarning: A low page limit (<=2) might return no results, try increasing the limit exec(code_obj, self.user_global_ns, self.user_ns)
######## Other Findings (Reactors : True) Other things i found out is that If i remove "reactors": True, i get Actual num of comments: 254 Returned num of comments: 91 Comments + replies: 227
If i Include "reactors": True, i get Actual num of comments: 254 Returned num of comments: 3 Comments + replies: 19
Ah, so if reactors is the issue, I wonder if you're running into https://github.com/kevinzg/facebook-scraper/issues/441. Try update to latest master
Hi! Good afternoon from New Zealand!
If I run your code, with one extra addition (
print('Comments + replies:', len(post['comments_full']) + sum(len(c["replies"]) for c in post["comments_full"]))
), I get the following:Actual num of comments: 254 Returned num of comments: 93 Comments + replies: 229
So the
comments
field is actually the number of comments and replies, whereaslen(post["comments_full"])
is just the number of top level comments.Do you get any locale warnings?
I've Updated to master. Hmm..., for some reason in Pycharm it returns Actual num of comments: 254 Returned num of comments: 93
But for IPYNB it returns Actual num of comments: 254 Returned num of comments: 3
Also, with respect to your* run : Actual num of comments: 254 Returned num of comments: 93 Comments + replies: 229
Why do i not achieve the same number of comments; 229 vs 254? Your reply was So the comments field is actually the number of comments and replies, whereas len(post["comments_full"]) is just the number of top level comments. Because when i see the facebook post.., there are 254 comments. But why do i get 229 when i scrape them? Sorry for the trouble again.., T.T
Also, when running on pycharm i get UserWarning: Locale detected as en_GB - for best results, set to en_US warnings.warn(f"Locale detected as {locale} - for best results, set to en_US")
Restart your jupyter kernel after updating the library.
Not all comments can be extracted - some are suppressed for being spam.
en_GB should work fine too
i am having issues with scraping comments on a Facebook Group's post. It's been working fine for a while on a post that was made in the past, but trying the scraper on a new post shows limitations. The scraper only gets about 40 comments when there are 170+. the post in the past has 380+ comments and the scraper was able to retrieve all of it. i follow the given example exactly with the exemption of adding the cookies parameter to the getposts. i updated facebook-scraper and im not sure if some comments are suppressed for being spam.
Hi! Good Morning from Singapore! Just would like to clarify why i am unable to get all the comments for a particular post?
In the post: https://www.facebook.com/ChannelNewsAsia/posts/10158518069332934 As of writing, there are ~250 comments. but when i scrape, using the following code, i can only retrieve the top 3 results. Prior last night was the top 8 comments.
My code: actual_url_gotGitHub = 'https://www.facebook.com/ChannelNewsAsia/posts/10158518069332934' url = 'ChannelNewsAsia/posts/10158518069332934' for post in get_posts(url, pages=2, options={"comments": True, "reactors": True, "progress": True}, extra_info = True, cookies = 'mal_cookie.txt' ):
print(post)