shaikhsajid1111 / facebook_page_scraper

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
https://pypi.org/project/facebook-page-scraper/
MIT License
209 stars 62 forks source link

Adding functionality for scraping FB groups #107

Closed lrudolph333 closed 3 months ago

lrudolph333 commented 3 months ago

Added additional parameters to scraper.py to allow for effective scraping of groups.

Pretty much all the functionality for scraping pages stays the same, but some additions or alterations were made for scraping groups, as they behave slightly differently.

Note that not all fields that are able to be scraped from pages can be scraped from groups. Namely, the 'name', 'post_url', 'content', and 'images' have been tested for functionality with groups, but all others may or may not work. 'posted_on' does not work yet.

lrudolph333 commented 3 months ago

I think a warning is fair. Some groups require login to view content, and some low-scale scraping will usually pass through Facebook's radar. Facebook also gives ample warning before a permanent ban is enforced, you usually get a couple strikes. Future contributors can add an option for only public group scraping that ignores login.

On Wed, Mar 27, 2024 at 3:36 AM Sajid Shaikh @.***> wrote:

@.**** commented on this pull request.

In facebook_page_scraper/element_finder.py https://github.com/shaikhsajid1111/facebook_page_scraper/pull/107#discussion_r1540848252 :

@@ -409,3 +472,40 @@ def __accept_cookies(driver): except Exception as ex: logger.exception("Error at accept_cookies: {}".format(ex)) sys.exit(1) +

  • @staticmethod
  • def __login(driver, username, password):

Maybe we can add warning that user do it at their own risk and they can get permanent ban

— Reply to this email directly, view it on GitHub https://github.com/shaikhsajid1111/facebook_page_scraper/pull/107#discussion_r1540848252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMRS2FVL62MUX4VJPFHJO7DY2KOKDAVCNFSM6AAAAABFIHDYTGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNRSHEYTQMZUGM . You are receiving this because you authored the thread.Message ID: @.*** com>