Closed AntonGisin closed 3 years ago
Hi, I think at this sort of scale, you might need to pass in cookies as per the readme. Additionally, when visiting https://m.facebook.com/FoxNews/posts/10160674691536336, I note that the page shows the link "View previous comments" instead of "View more comments" like the scraper expected - https://github.com/kevinzg/facebook-scraper/commit/3ce3db342e75102f761eba8119bf26c71234f4ed should fix this form of pagination.
Hi, I think at this sort of scale, you might need to pass in cookies as per the readme. Additionally, when visiting https://m.facebook.com/FoxNews/posts/10160674691536336, I note that the page shows the link "View previous comments" instead of "View more comments" like the scraper expected - 3ce3db3 should fix this form of pagination.
Thank you very much @neon-ninja ! I downloaded my facebook cookies in txt file and referenced them. It helped for a short period of time. Nevertheless, after an hour it again stopped showing comments :( I tried to refresh cookies file, but the result is the same. Could facebook block this option for my IP?
Yes - the scraper should throw an exception in that case. The comment extraction would handle the exception and log it. Try add these lines:
from facebook_scraper import *
import logging
enable_logging(logging.DEBUG)
and report back any log messages
Yes - the scraper should throw an exception in that case. The comment extraction would handle the exception and log it. Try add these lines:
from facebook_scraper import * import logging enable_logging(logging.DEBUG)
and report back any log messages
The log is here:
Parsing page response
Got 4 raw posts from page
Extracting posts from page 512
[10158356852134071] Extract method extract_video didn't return anything
[10158356852134071] Extract method extract_video_thumbnail didn't return anything
[10158356852134071] Extract method extract_video_id didn't return anything
Fetching https://m.facebook.com/businessinsider/posts/10158356852134071
[10158356852134071] Extract method extract_video_meta didn't return anything
[10158356852134071] Extract method extract_factcheck didn't return anything
[10158356852134071] Extract method extract_share_information didn't return anything
No comments found on page
[10158356852134071] Exception while extracting comments: TypeError("'NoneType' object is not iterable")
[10158356819979071] Extract method extract_link didn't return anything
until 512 page it collects comments properly, then it stops doing it.
Thank you!
posts = list(get_posts(
post_urls=[10158356852134071],
options = {"comments": True},
timeout = 60,
#cookies = "cookies.txt"
))
works fine, so the problem isn't with the post itself, but the volume of scraping you've been doing prior to scraping it. Perhaps at this sort of scale, you should keep records of which posts failed to extract comments, and come back to backfill them later, after whatever temporary block has worn off?
Hi guys, thank you for a web scraper! It looks realy nice but unfortunatelly it doesn't show comments. I ran this code:
but all posts look like this:
{'post_id': '10160674691536336', 'text': 'Israeli Prime Minister Benjamin Netanyahu said Sunday that his country is aiming to "degrade" Hamas to prevent future attacks.\n\nFOXNEWS.COM\nNetanyahu says Israel wants to \'degrade\' Hamas\' will, warns campaign will continue', 'post_text': 'Israeli Prime Minister Benjamin Netanyahu said Sunday that his country is aiming to "degrade" Hamas to prevent future attacks.', 'shared_text': "FOXNEWS.COM\nNetanyahu says Israel wants to 'degrade' Hamas' will, warns campaign will continue", 'time': datetime.datetime(2021, 5, 16, 21, 21, 34), 'image': 'https://static.foxnews.com/foxnews.com/content/uploads/2021/05/Netanyahu-Israel-Palestinian-Conflict-AP.jpg', 'image_lowquality': 'https://external.fhel6-1.fna.fbcdn.net/safe_image.php?d=AQGZSq_SQm-5-olG&w=476&h=249&url=https%3A%2F%2Fstatic.foxnews.com%2Ffoxnews.com%2Fcontent%2Fuploads%2F2021%2F05%2FNetanyahu-Israel-Palestinian-Conflict-AP.jpg&cfs=1&jq=75&ext=jpg&ccb=3-5&_nc_hash=AQGFWWTkKlZtr6HJ', 'images': ['https://static.foxnews.com/foxnews.com/content/uploads/2021/05/Netanyahu-Israel-Palestinian-Conflict-AP.jpg'], 'images_description': [], 'images_lowquality': ['https://external.fhel6-1.fna.fbcdn.net/safe_image.php?d=AQGZSq_SQm-5-olG&w=476&h=249&url=https%3A%2F%2Fstatic.foxnews.com%2Ffoxnews.com%2Fcontent%2Fuploads%2F2021%2F05%2FNetanyahu-Israel-Palestinian-Conflict-AP.jpg&cfs=1&jq=75&ext=jpg&ccb=3-5&_nc_hash=AQGFWWTkKlZtr6HJ'], 'images_lowquality_description': [None], 'video': None, 'video_duration_seconds': None, 'video_height': None, 'video_id': None, 'video_quality': None, 'video_size_MB': None, 'video_thumbnail': None, 'video_watches': None, 'video_width': None, 'likes': 890, 'comments': 2917, 'shares': 656, 'post_url': 'https://facebook.com/FoxNews/posts/10160674691536336', 'link': 'https://www.foxnews.com/world/netanyahu-israel-degrade-hamas?cmpid=fb_fnc', 'user_id': '15704546335', 'username': 'Fox News', 'user_url': 'https://facebook.com/FoxNews/?__tn__=C-R', 'is_live': False, 'factcheck': None, 'shared_post_id': None, 'shared_time': None, 'shared_user_id': None, 'shared_username': None, 'shared_post_url': None, 'available': True, 'comments_full': None, 'reactors': None, 'w3_fb_url': None}
A lot of comments, but comments_full is None. Could you help me please with this issue? Thank you!