moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
77 stars 28 forks source link

Unable to scrape replies to comments #40

Open MameJoe opened 7 months ago

MameJoe commented 7 months ago

When scraping for comments, I was unable to scrape the comment replies. For a comment in the JSON output, 'replies': would be empty ( 'replies': [] ), even if there were replies to the comment. How do I troubleshoot this?

moda20 commented 6 months ago

@MameJoe enabled logging by setting it to DEBUG level and check the logs, try with a testing script

Eao-Kind commented 6 months ago

me too, I cannot get the comments_full, Is it my coding error, Ask the author, the big brother to help check, thank you!

for post in get_posts(group="347269265856033",
                      base_url="https://mbasic.facebook.com",
                      pages=3,
                      cookies=cookies,
                      options={
                          "posts_per_page": 20,
                          "comments": 100, "progress": True
                      }):
    post_id = post['post_id']  # 获取帖子的ID
    post_text = post['text'][:50]  # 获取帖子内容
    print(f"Post: {post_text}")
    time.sleep(5)
    comments = post['comments_full']
    for comment in comments:
        comment_text = comment['comment_text']  # 获取评论内容
        print(f"Comment: {comment_text}")
        for reply in comment['replies']:
            print(' ', reply)
    listposts.append(post)

print(len(listposts))

post['comments_full'] is [],null

DEBUG INFO:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): facebook.com:443 DEBUG:urllib3.connectionpool:https://facebook.com:443 "GET /settings HTTP/1.1" 301 0 DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.facebook.com:443 DEBUG:urllib3.connectionpool:https://www.facebook.com:443 "GET /settings HTTP/1.1" 200 None DEBUG:facebook_scraper.facebook_scraper:Starting to iterate pages DEBUG:facebook_scraper.page_iterators:Requesting page from: https://m.facebook.com/groups/347269265856033/ DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): m.facebook.com:443 DEBUG:urllib3.connectionpool:https://m.facebook.com:443 "GET /groups/347269265856033/ HTTP/1.1" 200 None DEBUG:facebook_scraper.page_iterators:Parsing page response WARNING:facebook_scraper.page_iterators:No raw posts (

elements) were found in this page. DEBUG:facebook_scraper.page_iterators:The page url is: https://m.facebook.com/groups/347269265856033/ DEBUG:facebook_scraper.page_iterators:The page content is: +------------------------------------------------------------ | Pokemon Masters . . . | 1 | 1 Comment | Like | Show more reactions | Comment | Share | Buneary | Really cool! | Minajur Rahaman‎Pokemon Masters | March 10 at 2:52 PM · | More options | Best Starter family! | 9.4K | 100 comments35 shares | Like | Show more reactions | Comment | Share | Thomas Garcia | You know, we have the overrated fire starter,(Zard) the… More overrated water starter(Ninja). | .. is there an overrated Grass starter?? | View TimelineAdd to GroupInvite to Event

theFabulousFabius commented 5 months ago

@MameJoe enabled logging by setting it to DEBUG level and check the logs, try with a testing script

DEBUG:facebook_scraper.facebook_scraper:Requesting page from: https://mbasic.facebook.com/NintendoAmerica/posts/2257188721032235 DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): mbasic.facebook.com:443 DEBUG:urllib3.connectionpool:https://mbasic.facebook.com:443 "GET /NintendoAmerica/posts/2257188721032235 HTTP/1.1" 200 None WARNING:facebook_scraper.extractors:[None] Extract method extract_post_url didn't return anything DEBUG:tzlocal:Looking up time zone info from registry DEBUG:facebook_scraper.extractors:images length 0 WARNING:facebook_scraper.extractors:[None] Exception while running extract_photo_link: IndexError('pop from empty list') WARNING:facebook_scraper.extractors:[None] Exception while running extract_user_id: KeyError('content_owner_id_new') WARNING:facebook_scraper.extractors:[None] Extract method extract_video didn't return anything WARNING:facebook_scraper.extractors:[None] Extract method extract_video_thumbnail didn't return anything WARNING:facebook_scraper.extractors:[None] Extract method extract_video_id didn't return anything WARNING:facebook_scraper.extractors:[None] Extract method extract_video_meta didn't return anything WARNING:facebook_scraper.extractors:[None] Exception while running extract_is_live: IndexError('list index out of range') WARNING:facebook_scraper.extractors:[None] Extract method extract_factcheck didn't return anything WARNING:facebook_scraper.extractors:[None] Extract method extract_share_information didn't return anything WARNING:facebook_scraper.extractors:[None] Extract method extract_listing didn't return anything WARNING:facebook_scraper.extractors:[None] Extract method extract_with didn't return anything ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('el',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('em',) id='comment_replies_more_1:2257188721032235_2257226631028444'>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'en', 'eo', 'ep')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('el',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('em',) id='comment_replies_more_1:2257188721032235_2257908914293549'>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'en', 'eo', 'ep')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('el',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('em',) id='comment_replies_more_1:2257188721032235_2257578297659944'>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'en', 'eo', 'ep')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ed',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ee',)>: 'NoneType' object has no attribute 'text' ERROR:facebook_scraper.extractors:Unable to parse comment <Element 'div' class=('ef', 'eg', 'dt')>: 'NoneType' object has no attribute 'text' DEBUG:facebook_scraper.extractors:Fetching up to 100 comments 0%| | 0/100 [00:00<?, ?it/s] ERROR:facebook_scraper.extractors:'NoneType' object has no attribute 'group' ERROR:facebook_scraper.extractors:'NoneType' object has no attribute 'group' ERROR:facebook_scraper.extractors:'NoneType' object has no attribute 'group' ERROR:facebook_scraper.extractors:'NoneType' object has no attribute 'group' ERROR:facebook_scraper.extractors:'NoneType' object has no attribute 'group' ERROR:facebook_scraper.extractors:'NoneType' object has no attribute 'group' {'comment_id': '2257255941025513', 'comment_url': 'https://facebook.com/2257255941025513', 'commenter_id': None, 'commenter_url': 'https://facebook.com/saul.juarezperez.79?eav=Afa9CvPIfOdAp7YCyhh6fxt3Eilh1MzRTJR8RV6NC6bSKVrJmv4IoeSD4Z8_ErMQmK4&rc=p&__tn__=R&paipv=0', 'commenter_name': 'Saúl Azael Juárez Pérez', 'commenter_meta': None, 'comment_text': "He never use that shield in majora's mask.. u can do better Nintendo", 'comment_time': datetime.datetime(2019, 4, 29, 0, 0), 'comment_image': None, 'comment_reactors': [], 'comment_reactions': None, 'comment_reaction_count': '1', 'replies': []} {'comment_id': None, 'comment_url': 'https://facebook.com/', 'commenter_id': None, 'commenter_url': 'https://facebook.com/saul.juarezperez.79?eav=Afa9CvPIfOdAp7YCyhh6fxt3Eilh1MzRTJR8RV6NC6bSKVrJmv4IoeSD4Z8_ErMQmK4&rc=p&__tn__=R&paipv=0', 'commenter_name': 'Saúl Azael Juárez Pérez', 'commenter_meta': None, 'comment_text': "Saúl Azael Juárez Pérez\nHe never use that shield in majora's mask.. u can do better Nintendo\n1 · Like · React · Reply · More · Apr 29, 2019", 'comment_time': datetime.datetime(2019, 4, 29, 0, 0), 'comment_image': None, 'comment_reactors': [], 'comment_reactions': None, 'comment_reaction_count': '1', 'replies': []}