kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.27k stars 611 forks source link

[Question] A library to search for facebook groups with given keywords #428

Open sla-te opened 2 years ago

sla-te commented 2 years ago

2 years ago this was possible with Facebook Graph API, which is sadly deprecated now, is there a library you know of, that is capable of doing this?

neon-ninja commented 2 years ago

Why not use Facebook's web interface to do this search? For example:

https://www.facebook.com/search/groups/?q=games or https://m.facebook.com/search/groups/?q=games&source=filter&isTrending=0&tsid=0.44263932103561143

sla-te commented 2 years ago

Yeah, sounds good, your lib would be a good base to start, as we need to be logged in to search for groups afaik. Would be sufficient being able to provide keywords for starters and return the group URLs. - If you create the request, that returns the HTML which in includes the URLs I could write the part to scrape them with beautifulsoup for instance.

neon-ninja commented 2 years ago

I meant, why not do it manually in your browser? Do you have hundreds of search terms or something?

sla-te commented 2 years ago

Yeah, I have 400 keywords, that I need to find the facebook groups for. : )

neon-ninja commented 2 years ago

Related issue: https://github.com/kevinzg/facebook-scraper/issues/419

bipsen commented 2 years ago

I believe this does what is requested. It adds a method get_groups_by_search which searches for groups, finds their id, and yields the result of get_group_info with that group_id.

from facebook_scraper import FacebookScraper, utils, get_group_info
from facebook_scraper.constants import FB_MOBILE_BASE_URL

class FacebookScraper(FacebookScraper):
    def get_groups_by_search(self, word: str, **kwargs):

        """Searches Facebook groups and yields ids for each result
        on the first page"""

        group_search_url = utils.urljoin(FB_MOBILE_BASE_URL, f"search/groups/?q={word}")
        r = self.get(group_search_url)
        for group_element in r.html.find('div[role="button"]'):
            button_id = group_element.attrs["id"]
            group_id = find_group_id(button_id, r.text)
            yield get_group_info(group_id)

def find_group_id(button_id, raw_html):

    """Each group button has an id, which appears later in the script
    tag followed by the group id."""

    s = raw_html[raw_html.rfind(button_id) :]
    group_id = s[s.find("result_id:") :].split(",")[0].split(":")[1]
    return int(group_id)

scraper = FacebookScraper()
scraper.login(email=EMAIL, password=PWD)

for group_info in scraper.get_groups_by_search("coffee"):
    print(group_info)

Result:

{'id': '1996185023800606', 'name': 'Coffee lovers', 'type': 'Public group', 'members': 14299}
{'id': '2204925119', 'name': 'COFFEE COFFEE COFFEE!!!', 'type': 'Public group', 'members': 340455}
{'id': '755007758392142', 'name': 'LATTE ART', 'type': 'Public group', 'members': 46079}
{'id': '534483107108037', 'name': 'BARISTA COMMUNITY', 'type': 'Public group', 'members': 169960}
{'id': '721633338172381', 'name': 'Funny Coffee Memes', 'type': 'Public group', 'members': 219281}
{'id': '587751572609633', 'name': 'Coffee ☕ & Rain 🌧', 'type': 'Public group', 'members': 116986}
{'id': '823558245059998', 'name': '林芊妤 Coffee 粉絲群組', 'type': 'Public group', 'members': 7932}
{'id': '1574636316089193', 'name': 'I Love Coffee', 'type': 'Public group', 'members': 208646}
{'id': '120661273275592', 'name': 'Coffee & Cake Lovers 💏 ☕🍰', 'type': 'Public group', 'members': 40836}
{'id': '359032028835121', 'name': 'Coffee ☕❤', 'type': 'Public group', 'members': 21074}
{'id': '364701647546998', 'name': 'COFFEE BEANS MARKET', 'type': 'Public group', 'members': 56691}
{'id': '746157059433578', 'name': 'Coffee Everyday', 'type': 'Public group', 'members': 6113}
neon-ninja commented 2 years ago

Great - could you please submit a pull request?

TranHuuHieu15 commented 3 months ago

Tôi tin rằng điều này làm những gì được yêu cầu. Nó thêm một phương thức get_groups_by_searchtìm kiếm các nhóm, tìm id của họ và mang lại kết quả get_group_infovới group_id đó.

from facebook_scraper import FacebookScraper, utils, get_group_info
from facebook_scraper.constants import FB_MOBILE_BASE_URL

class FacebookScraper(FacebookScraper):
    def get_groups_by_search(self, word: str, **kwargs):

        """Searches Facebook groups and yields ids for each result
        on the first page"""

        group_search_url = utils.urljoin(FB_MOBILE_BASE_URL, f"search/groups/?q={word}")
        r = self.get(group_search_url)
        for group_element in r.html.find('div[role="button"]'):
            button_id = group_element.attrs["id"]
            group_id = find_group_id(button_id, r.text)
            yield get_group_info(group_id)

def find_group_id(button_id, raw_html):

    """Each group button has an id, which appears later in the script
    tag followed by the group id."""

    s = raw_html[raw_html.rfind(button_id) :]
    group_id = s[s.find("result_id:") :].split(",")[0].split(":")[1]
    return int(group_id)

scraper = FacebookScraper()
scraper.login(email=EMAIL, password=PWD)

for group_info in scraper.get_groups_by_search("coffee"):
    print(group_info)

Kết quả:

{'id': '1996185023800606', 'name': 'Coffee lovers', 'type': 'Public group', 'members': 14299}
{'id': '2204925119', 'name': 'COFFEE COFFEE COFFEE!!!', 'type': 'Public group', 'members': 340455}
{'id': '755007758392142', 'name': 'LATTE ART', 'type': 'Public group', 'members': 46079}
{'id': '534483107108037', 'name': 'BARISTA COMMUNITY', 'type': 'Public group', 'members': 169960}
{'id': '721633338172381', 'name': 'Funny Coffee Memes', 'type': 'Public group', 'members': 219281}
{'id': '587751572609633', 'name': 'Coffee ☕ & Rain 🌧', 'type': 'Public group', 'members': 116986}
{'id': '823558245059998', 'name': '林芊妤 Coffee 粉絲群組', 'type': 'Public group', 'members': 7932}
{'id': '1574636316089193', 'name': 'I Love Coffee', 'type': 'Public group', 'members': 208646}
{'id': '120661273275592', 'name': 'Coffee & Cake Lovers 💏 ☕🍰', 'type': 'Public group', 'members': 40836}
{'id': '359032028835121', 'name': 'Coffee ☕❤', 'type': 'Public group', 'members': 21074}
{'id': '364701647546998', 'name': 'COFFEE BEANS MARKET', 'type': 'Public group', 'members': 56691}
{'id': '746157059433578', 'name': 'Coffee Everyday', 'type': 'Public group', 'members': 6113}

hey pro, pls help me, i can't run this code

TranHuuHieu15 commented 3 months ago

this is error D:\MyJob\Python\PyCharm\ai-report.venv\lib\site-packages\facebook_scraper\facebook_scraper.py:855: UserWarning: Facebook language detected as vi_VN - for best results, set to en_US warnings.warn( Traceback (most recent call last): File "D:\MyJob\Python\PyCharm\ai-report\demo.py", line 32, in scraper.login(email=EMAIL, password=PWD) File "D:\MyJob\Python\PyCharm\ai-report.venv\lib\site-packages\facebook_scraper\facebook_scraper.py", line 998, in login f.write(response.text) File "D:\Important\IT\python2\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u1ead' in position 50: character maps to