kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.41k stars 629 forks source link

I can only download posts from months 3, 6, 9 and 12. #517

Open jorgeortizfuentes opened 3 years ago

jorgeortizfuentes commented 3 years ago

I am trying to download posts from public Facebook pages. However, facebook-scraper only downloads me posts from months that are multiples of 3 (March, June, September, December). I have tried with different accounts and different pages and it keeps happening.

neon-ninja commented 3 years ago

Can you give an example of a page that has this problem? What version of the facebook-scraper are you using? Try latest master? Are you using cookies?

jorgeortizfuentes commented 3 years ago

I am using version 0.2.47. I am using cookies. I am trying to download Chilean media publications from pages like "laterceracom" or "RadioBioBio".

I am using this code.

neon-ninja commented 3 years ago

This issue occurs within the space of one page. It occurs regardless of posts_per_page. Here's an example page: https://m.facebook.com/page_content_list_view/more/?page_id=10383924671&start_cursor={"timeline_cursor":"AQHRtuKtkayH4asezqm92G3TOdDFkIAbdh4ddfe_DztmWmpfXCejB5e2bTpGhTCJikAvfrzoAgQ6Neg_2tlP2z0aSELMD9O6oX5WLyy9p9iX09ofzQoENHQzDSrC15D-FzMo","timeline_section_cursor":null,"has_next_page":true}&num_to_fetch=4&surface_type=posts_tab.

If you look for abbr elements in this page, you can see the jump from 20 September at 14:18 to 28 June at 18:20. There exist RadioBioBio posts within that interval (https://m.facebook.com/RadioBioBio/photos/a.241426294671/10160542770204672/?type=3, https://m.facebook.com/RadioBioBio/posts/10160510474104672). This looks like a bug in Facebook to me. I don't see what the scraper can do to solve this.

Side note: here's my code for investigating this:

from facebook_scraper import *
import logging
from pprint import pprint
enable_logging(logging.DEBUG)
set_cookies("cookies.json")
last_post = None
for post in get_posts("RadioBioBio", pages=None, timeout=600, options={"allow_extra_requests": False, "posts_per_page": 200}):
    if last_post:
        diff = last_post["time"] - post["time"]
        if abs(diff.days) > 15:
            print("Large time gap!")
            print(diff)
            pprint(last_post)
            pprint(post)
            exit(1)
    last_post = post
culiver commented 2 years ago

I am trying to download posts from public Facebook pages. However, facebook-scraper only downloads me posts from months that are multiples of 3 (March, June, September, December). I have tried with different accounts and different pages and it keeps happening.

Hi @jorgeortizfuentes ~ I faced a similar problem and want to ask if you find any other solutions?

jorgeortizfuentes commented 1 year ago

@culiver I could not solve it :(