kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.44k stars 633 forks source link

Unable to get any posts via search #910

Open richylyq opened 2 years ago

richylyq commented 2 years ago

when using the below code to fetch posts via search, nothing is returned as no raw posts were found

Python version: 3.9.13 facebook-scraper version: 0.2.59

Sample Code:

# Trying to get post by search
from facebook_scraper import get_posts_by_search, enable_logging
import pandas as pd
import warnings
warnings.filterwarnings(action='once')
import logging
enable_logging(logging.DEBUG)

# Initialize dataframe to scrape Facebook post
post_df_full_bysearch = pd.DataFrame(columns = [])

for post in get_posts_by_search("sergio perez", cookies='cookies.txt', extra_info=True, pages=5, options={"comments": True}):
    post_entry = post
    fb_post_df = pd.DataFrame.from_dict(post_entry, orient='index')
    fb_post_df = fb_post_df.transpose()
    post_df_full_bysearch = post_df_full_bysearch.append(fb_post_df)
    print(post['post_id'] + ' get')

Output:

Starting to iterate pages
Requesting page from: https://m.facebook.com/search/posts?q=sergio perez&filters=eyJyZWNlbnRfcG9zdHM6MCI6IntcIm5hbWVcIjpcInJlY2VudF9wb3N0c1wiLFwiYXJnc1wiOlwiXCJ9In0%3D
Parsing page response
No raw posts (<article> elements) were found in this page.
The page url is: https://m.facebook.com/search/posts?q=sergio%20perez&filters=eyJyZWNlbnRfcG9zdHM6MCI6IntcIm5hbWVcIjpcInJlY2VudF9wb3N0c1wiLFwiYXJnc1wiOlwiXCJ9In0%3D
The page content is:
+------------------------------------------------------------
| sergio perez - Facebook Search
FlorenciaBCabrera commented 1 year ago

Hello @richylyq could you solve it? I have the same problem

Mahmoud-Emarah-Syenah commented 1 year ago

Could anybody solve this issue? I have the same issue.

chienyutseng commented 1 year ago

After testing, the current feasible method is to change the return self._get_page('article[data-ft*="top_level_post_id"]', 'article') in the get_page function in page_iterators.py to return self._get_page('[data-ft*="top_level_post_id"]', 'top_level_post_id')