rugantio / fbcrawl

A Facebook crawler
Apache License 2.0
661 stars 229 forks source link

Old posts returning errors #24

Closed rugantio closed 5 years ago

rugantio commented 5 years ago

In old posts reactions were not implemented, parse_post and parse_reactions need to be adjust accordingly.

1st error: Full text link (notizia completa) is returning

[scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/ninuxfirenze?sectionLoadingID=m_timeline_loading_div_1388563199_1357027200_8_timeline_unit%3A1%3A00000000001382986503%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001382986503%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&timeend=1388563199&timestart=1357027200&tm=AQBL9hUDXCpxoiCM&refid=17> (referer: https://mbasic.facebook.com/ninuxfirenze?sectionLoadingID=m_timeline_loading_div_1388563199_1357027200_8_timeline_unit%3A1%3A00000000001384117176%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001384117176%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&timeend=1388563199&timestart=1357027200&tm=AQBL9hUDXCpxoiCM&refid=17)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/rugantio/Downloads/fbcrawl/fbcrawl/spiders/fbcrawl.py", line 193, in parse_page
    new_page = response.urljoin(new_page[0])
IndexError: list index out of range

2nd error: Reaction page is empty

[scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/ufi/reaction/profile/browser/?ft_ent_identifier=717978068231023&refid=17&_ft_=top_level_post_id.717978478230982%3Atl_objid.717978478230982%3Apage_id.717952914900205%3Aphoto_attachments_list.%5B717978291564334%2C717978641564299%2C717978478230982%5D%3Aphoto_id.717978291564334%3Astory_location.4%3Astory_attachment_style.new_album%3Apage_insights.%7B%22717952914900205%22%3A%7B%22role%22%3A1%2C%22page_id%22%3A717952914900205%2C%22post_context%22%3A%7B%22story_fbid%22%3A%5B717978754897621%2C717978068231023%5D%2C%22publish_time%22%3A1382985680%2C%22story_name%22%3A%22EntPhotoNodeBasedEdgeStory%22%2C%22object_fbtype%22%3A22%7D%2C%22actor_id%22%3A717952914900205%2C%22psn%22%3A%22EntPhotoNodeBasedEdgeStory%22%2C%22sl%22%3A4%2C%22dm%22%3A%7B%22isShare%22%3A0%2C%22originalPostOwnerID%22%3A0%7D%2C%22targets%22%3A%5B%7B%22page_id%22%3A717952914900205%2C%22actor_id%22%3A717952914900205%2C%22role%22%3A1%2C%22post_id%22%3A717978754897621%2C%22share_id%22%3A0%7D%2C%7B%22page_id%22%3A717952914900205%2C%22actor_id%22%3A717952914900205%2C%22role%22%3A1%2C%22post_id%22%3A717978068231023%2C%22share_id%22%3A0%7D%5D%7D%7D%3Athid.717952914900205%3A306061129499414%3A43%3A0%3A1556693999%3A-3054257213565457848&__tn__=%2AW-R#footer_action_list> (referer: https://mbasic.facebook.com/ninuxfirenze?sectionLoadingID=m_timeline_loading_div_1556693999_0_36_timeline_unit%3A1%3A00000000001383574409%3A04611686018427387904%3A09223372036854775776%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001383574409%3A04611686018427387904%3A09223372036854775776%3A04611686018427387904&timeend=1556693999&timestart=0&tm=AQBTYgKZm-RBkzwc&refid=17)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib/python3.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/rugantio/Downloads/fbcrawl/fbcrawl/spiders/fbcrawl.py", line 218, in parse_post
    reactions = response.urljoin(reactions[0].extract())
  File "/usr/lib/python3.7/site-packages/parsel/selector.py", line 61, in __getitem__
    o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range