moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
62 stars 24 forks source link

get_posts returning only 5 posts #2

Closed basheerpaliyathu closed 7 months ago

basheerpaliyathu commented 7 months ago

posts = get_posts(current_page, start_url="https://mbasic.facebook.com/" + current_page + "?v=timeline", pages=10, cookies="cookies.txt", options={"posts_per_page": 10, "allow_extra_requests": True, "comments": True})

I am using this code to scrap posts, this returns only 5 posts. They are the last 5 posts.

the log cat says: Looking for next page URL Page parser did not find next page URL

moda20 commented 7 months ago

@basheerpaliyathu i updated the master branch with a quick fix for this, but i am not sure if it will work totally for you. please try it and tell me if it resolves it. also this is only for page scrapping not groups

basheerpaliyathu commented 7 months ago

after fetching 5 posts, my account got temporary block.

facebook_scraper.exceptions.TemporarilyBanned: You’re Temporarily Blocked

moda20 commented 7 months ago

@basheerpaliyathu that's facebook policy, you must have used it a lot, like try to parse 100 posts at once. wait for a day and try to space your requests, maybe make 5 posts a day instead of all at once

basheerpaliyathu commented 7 months ago

actually my account not getting blocked, i can browser through web browser, and also able to fetch posts using this scraper, but it throws exception right after 5th post.

before showing Temporary banned message, it is showing 500 internal server error.

Caught exception, retry number 1. Sleeping for 2s Requesting page from: https://m.facebook.com/profile/timeline/stream/?cursor=<encrypted_key>&start_time=-9223372036854775808&profile_id=100064322996629&replace_id=u_0_1_I%2B&refid=17&paipv=0&eav=<encrypted_key>&start_time=-9223372036854775808&profile_id=100064322996629&replace_id=u_0_1_I%2B&refid=17&paipv=0&eav=<encrypted_key> Exception: HTTPError('500 Server Error: Internal Server Error for url: https://www.facebook.com/profile/timeline/stream/?cursor=<encrypted_key>&start_time=-9223372036854775808&profile_id=100064322996629&replace_id=u_0_1_I%2B&refid=17&paipv=0&eav=<encrypted_key>') Traceback (most recent call last): File "/Users/admin/Documents/facebook-scraper/facebook_scraper/facebook_scraper.py", line 881, in get response.raise_for_status() File "/Users/admin/Documents/facebook-scraper/venv/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://www.facebook.com/profile/timeline/stream/?cursor=<encrypted_key>&start_time=-9223372036854775808&profile_id=100064322996629&replace_id=u_0_1_I%2B&refid=17&paipv=0&eav=<encrypted_key>&_rdr Caught exception, retry number 2. Sleeping for 4s

moda20 commented 7 months ago

@basheerpaliyathu i see, you are missing a base_url argument, i haven't added that to the readme yet. but in short to force the parser to get the next page from mbasic you need to explicitly tell it to do so: add this where tou add start_url. base_url="https://mbasic.facebook.com",

basheerpaliyathu commented 7 months ago

Awesome, that worked.