moda20 / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
60 stars 23 forks source link

Only get few posts #10

Closed Joshuahuang55 closed 6 months ago

Joshuahuang55 commented 6 months ago

Hi, thanks for providing the new way to get the posts.

There are some problems show up:

  1. If I set group parameter in get_posts funciton, it turns out error.

""" File ~\anaconda3\Lib\site-packages\facebook_scraper\extractors.py:108 in init self.scraper = kwargs['scraper']

KeyError: 'scraper'

"""

  1. The problem above solved ffter I remove the parameter group and just type the groups ID. But even I set the pages parameter in the get_posts function to 20, I still only get 8 posts. It seems like mbasic.facebook only display 8 posts in one pages. If I want to check more, I have to click the "see more posts" button.

Is there any other way to fix this problem? Thanks a lot.

Here is the code that I can only get 8 posts in a group:

from facebook_scraper import get_posts

for post in get_posts('817620721658179', base_url="https://mbasic.facebook.com/groups", 
start_url="https://mbasic.facebook.com/groups/817620721658179?v=timeline", 
                      pages=50,
 cookies = "www.facebook.com_cookies.txt"):

  print(post['text'][:50])
moda20 commented 6 months ago

@Joshuahuang55 Thanks, I have updated my repo (this one) to fix the error you mentioned with the KeyError. and found an issue with the page cursor having old regex. It should get all the pages now.

Cartasiane commented 6 months ago

Still having only 8 posts on a private groupe after pip install --force-reinstall --no-deps git+https://github.com/moda20/facebook-scraper.git@master

Joshuahuang55 commented 6 months ago

@moda20 Thanks!! I can get all pages now. So appreciate!!

@Cartasiane You have to join the private group and adjust your code.

And then you need to assert "group" parameter to tell the code you want to scrap "group". So you can get more than 8 posts.

from facebook_scraper import get_posts

for post in get_posts(group = '817620721658179', base_url="https://mbasic.facebook.com/groups", 
start_url="https://mbasic.facebook.com/groups/817620721658179?v=timeline", 
                      pages=50,
 cookies = "www.facebook.com_cookies.txt"):

  print(post['text'][:50])
Cartasiane commented 6 months ago

Ok, this work for me but even if I'm not in verbose mode the script is printing using extra page processor for every page... Might be nice to remove this!

moda20 commented 6 months ago

@Cartasiane yes, i have udpated the repo to remove that extra print

moda20 commented 6 months ago

@Cartasiane @Joshuahuang55 i am closing this issue since it's resolved