shaikhsajid1111 / facebook_page_scraper

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
https://pypi.org/project/facebook-page-scraper/
MIT License
217 stars 65 forks source link

posts_count bigger than 19 results in only 19 scraped posts #8

Open verscph opened 3 years ago

verscph commented 3 years ago

Hi,

When I want to scrape the last 100 posts on a Facebook page:

facebook_ai = Facebook_scraper("facebookai",100,"chrome")
json_data = facebook_ai.scrap_to_json()
print(json_data)

Only 19 posts are scraped. I tried with other pages too, the same result.

Any ideas what goes wrong?

shaikhsajid1111 commented 3 years ago

Facebook might be blocking your IP

On Thu, May 20, 2021, 8:30 PM Philip Verschueren @.***> wrote:

Hi,

When I want to scrape the last 100 posts on a Facebook page:

facebook_ai = Facebook_scraper("facebookai",100,"chrome") json_data = facebook_ai.scrap_to_json() print(json_data)

Only 19 posts are scraped. I tried with other pages too, the same result.

Any ideas what goes wrong?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/shaikhsajid1111/facebook_page_scraper/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKQLNVJR4BS6Z5PYAVUTUP3TOUPZHANCNFSM45HGNLKA .

verscph commented 3 years ago

I can launch several Facebook_scraper calls (in a loop), all of them give a result of 19 posts. As every call gives a result, I assume my IP isn't blocked?

shaikhsajid1111 commented 3 years ago

Can you tell me which Facebook page you're trying?

On Thu, May 20, 2021, 8:41 PM Philip Verschueren @.***> wrote:

I can launch several Facebook_scraper calls (in a loop), all of them give a result of 19 posts. As every call gives a result, I assume my IP isn't blocked?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/shaikhsajid1111/facebook_page_scraper/issues/8#issuecomment-845207702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKQLNVNHA45SDKCSTM6E2ATTOURC5ANCNFSM45HGNLKA .

verscph commented 3 years ago

to_scrape = ["MediaMarktBE","darty","CoolblueBelgie","Amazon","curryspcworld","unieuro","elcorteingles","elkjop"]

Tried "facebookai" too.

shaikhsajid1111 commented 3 years ago

Okay, I'll check and see for sure

shaikhsajid1111 commented 3 years ago

Did you get any error on console?, Is it closing automatically without any error with 19 posts?

verscph commented 3 years ago

No errors are shown.

On Thu, May 20, 2021 at 6:17 PM Sajid Shaikh @.***> wrote:

Did you get any error on console?, Is it closing automatically without any error with 19 posts?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/shaikhsajid1111/facebook_page_scraper/issues/8#issuecomment-845261176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHP6UGFK27GFDYX2OFA64LTOUYXZANCNFSM45HGNLKA .

verscph commented 3 years ago

@shaikhsajid1111 any updates on this?

shaikhsajid1111 commented 3 years ago

Hello, Apology for the late reply. I picked CoolblueBelgie from the list and tried with 100 posts and it did work result here. Then I tried entire lists with 40 posts each, it worked as well.

People had similar problems before, for most of them, the solution was the "language" they were using Facebook in their native language however this web crawler works with English(UK) only. After changing to English(UK) it was resolved.

I have a maximum went of 125 posts, after that FB redirects you to the login page, so within 100, it should work.

verscph commented 3 years ago

Tx for your feedback, changed my language settings to en_uk. Using scrap_to_json is working as expected, though scrap_to_csv still limits the number of posts. I'll continue with the json, but you might look at scrap_to_csv to check if it's working fine on your side.

Thanks for you help!

shaikhsajid1111 commented 3 years ago

Great, thanks I will check for sure.