shaikhsajid1111 / facebook_page_scraper

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
MIT License
222 stars 67 forks source link

CRITICAL - No posts were found! #91

Open talatoncu opened 8 months ago

talatoncu commented 8 months ago

I used the example you have given.

import Facebook_scraper class from facebook_page_scraper

from facebook_page_scraper import Facebook_scraper

instantiate the Facebook_scraper class

page_name = "##MYNAME##" posts_count = 10 browser = "firefox" proxy = "" #if proxy requires authentication then user:password@IP:PORT timeout = 600 #600 seconds headless = True meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

json_data = meta_ai.scrap_to_json() print(json_data)

Following messages appear and I get no posts.

2024-01-04 09:53:29,565 - facebook_page_scraper.driver_initialization - INFO - Using: [WDM] - There is no [win64] geckodriver for browser in cache [WDM] - Getting latest mozilla release info for v0.34.0 [WDM] - Trying to download new driver from [WDM] - Driver has been saved in cache [C:\Users\Talat Oncu.wdm\drivers\geckodriver\win64\v0.34.0] 2024-01-04 09:54:31,409 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found! Exit code: 1

Then I used for NintendoAmerica

import Facebook_scraper class from facebook_page_scraper

from facebook_page_scraper import Facebook_scraper

instantiate the Facebook_scraper class

page_name = "NintendoAmerica" posts_count = 10 browser = "firefox" proxy = "" #if proxy requires authentication then user:password@IP:PORT timeout = 600 #600 seconds headless = True meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

json_data = meta_ai.scrap_to_json() print(json_data)

The program gives the message

2024-01-04 10:11:18,586 - facebook_page_scraper.driver_initialization - INFO - Using: [WDM] - Driver [C:\Users\Talat Oncu.wdm\drivers\geckodriver\win64\v0.34.0\geckodriver.exe] found in cache

and waits indefinitely.

gayathriravipati commented 8 months ago

Have the same issue, checked what's happening by having headless to false

I see that the browser doesn't login and the following result is seen on terminal

2024-01-04 16:02:11,918 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

Can anyone help to figure out what can be done to figure this out. Thank you!

ExpiredMeteor6 commented 8 months ago

Hi all, i have the same issue when running on ubuntu but not on windows 11! Instead of the usual log in with the x in top right corner of widget we get a seperate page which requires a login before redirecting to the desired page.

If there is a way that we could login on then the webdriver would remember that and we would not get this issue, unfortunately everything i tried on this doesnt work. I have managed to solve the issue by coding my own facebook scraper using a chrome driver that can use a specific user data profile, but would prefer to use this if we can get a patch as less for me to maintain :D


ExpiredMeteor6 commented 8 months ago

following on from this, i tried using a UK proxy which worked and produced the desired outcome

GazTrab commented 8 months ago

following on from this, i tried using a UK proxy which worked and produced the desired outcome

Could you tell the noob like me how to set proxy to UK?

ExpiredMeteor6 commented 8 months ago

following on from this, i tried using a UK proxy which worked and produced the desired outcome

Could you tell the noob like me how to set proxy to UK?

proxy='exampleproxy:exampleport' Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

shaikhsajid1111 commented 8 months ago

@ExpiredMeteor6 Yes, using a Chrome profile that is already logged in will unblock you. Unfortunately, I cannot make that feature a part of this project as it claims that it can only scrape data available publicly

testproto commented 8 months ago

@shaikhsajid1111 is there any exception do you have which i can import in my code to handle the error [WDM] - Driver [C:\Users\manrkaur.wdm\drivers\geckodriver\win64\v0.34.0\geckodriver.exe] found in cache 2024-02-02 16:10:25,737 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

def scrape_facebook_data(page_names, posts_count=10, browser="firefox", proxy=None, timeout=600, headless=True): """ Scrapes Facebook data for the given page names.

- page_names: List of Facebook page names
- posts_count: Number of posts to scrape per page
- browser: Browser to use (e.g., "firefox")
- proxy: Proxy information (e.g., "IP:PORT" or None)
- timeout: Timeout in seconds
- headless: Whether to run the browser in headless mode

- A dictionary containing the scraped data for each page
scraped_data = {}

for page_name in page_names:
    # Instantiate the Facebook_scraper class
    meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

    # Scraping data and converting it to JSON
    json_data_str = meta_ai.scrap_to_json()

    # Parse the JSON string into a dictionary
    json_data = json.loads(json_data_str)

    # Create an array to store post information
    posts_array = []

    # Iterate through each post and append to the array
    for post_id, post_data in json_data.items():
        time = post_data.get('posted_on', "")
        content = post_data.get("content", "")
        reaction_count = post_data.get('reaction_count',"")
        comments = post_data.get('comments',"")

        # Add a condition to check if content is not empty before appending
        if content:
            # Append post information to the array
                # "Post ID": post_id,
                "Content": content,
                "Posted on": time,

    # Store the array for the current page in the result dictionary
    scraped_data[page_name] = posts_array

return scraped_data
shaikhsajid1111 commented 7 months ago

@testproto There isn't any custom Exception that it throws when no posts are found. You can write a wrapper function over this with try/except ?, If I'm understanding your requirement properly

testproto commented 7 months ago

@testproto There isn't any custom Exception that it throws when no posts are found. You can write a wrapper function over this with try/except ?, If I'm understanding your requirement properly

It throws error when any page is private so how i can handle that scenario? Could you please help me with that @shaikhsajid1111 ?

testproto commented 7 months ago

@testproto There isn't any custom Exception that it throws when no posts are found. You can write a wrapper function over this with try/except ?, If I'm understanding your requirement properly

from facebook_page_scraper import Facebook_scraper from facebook_page_scraper.driver_utilities import Utilities # Importing the Utilities class from your module from selenium import webdriver from import By from import Options from bs4 import BeautifulSoup import json import re import requests

import logging

logging.basicConfig(level=logging.INFO) # Set the logging level to INFO or higher

def extract_facebook_page_name(url): """ Extracts Facebook page name from a given URL.

- url: URL of the website

- Facebook page name if found, otherwise None
    # Make a direct request and check the response status
    response = requests.get(url)
    if response.status_code == 200:
        page_source = response.text
        # Use Selenium to get the page source if direct request fails
        chrome_options = Options()
        driver = webdriver.Chrome(options=chrome_options)
        page_source = driver.page_source
except Exception as e:
    print(f"Error: {e}")
    return None

# Use BeautifulSoup to parse HTML and find Facebook page link
soup = BeautifulSoup(page_source, 'html.parser')
facebook_link = soup.find('a', href=re.compile(r'facebook\.com', re.IGNORECASE))

if facebook_link:
    # Extract page name from the Facebook link
    match ='facebook\.com/([^/?]+)', facebook_link['href'])
    if match:
        # Check if the page is private
        if "page doesn't exist" in page_source or "The link you followed may be broken, or the page may have been removed" in page_source:
            print(f"The Facebook page at {url} is either private or does not exist.")
            return None

return None

def scrape_facebook_data(page_names, posts_count=10, browser="firefox", proxy=None, timeout=600, headless=True): """ Scrapes Facebook data for the given page names.

- page_names: List of Facebook page names
- posts_count: Number of posts to scrape per page
- browser: Browser to use (e.g., "firefox")
- proxy: Proxy information (e.g., "IP:PORT" or None)
- timeout: Timeout in seconds
- headless: Whether to run the browser in headless mode

- A dictionary containing the scraped data for each page, or None if no posts are found
scraped_data = {}

for page_name in page_names:
    # Instantiate the Facebook_scraper class
    meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

        # Scraping data and converting it to JSON
        json_data_str = meta_ai.scrap_to_json()

        # Parse the JSON string into a dictionary
        json_data = json.loads(json_data_str)

        # Create an array to store post information
        posts_array = []

        # Iterate through each post and append to the array
        for post_id, post_data in json_data.items():
            time = post_data.get('posted_on', "")
            content = post_data.get("content", "")
            reaction_count = post_data.get('reaction_count',"")
            comments = post_data.get('comments',"")

            # Add a condition to check if content is not empty before appending
            if content:
                # Append post information to the array
                    # "Post ID": post_id,
                    "Content": content,
                    "Posted on": time,

        # Store the array for the current page in the result dictionary
        scraped_data[page_name] = posts_array

    except Exception as e:
        # Log the error as critical
        print(f"Error scraping data for page '{page_name}': {e}")
        continue  # Continue to the next page if an error occurs

# Check if any data was scraped
if not scraped_data:
    print("No posts were found for any of the provided pages.")
    return None

return scraped_data

def getSocialMedia(urls, posts_count=10, browser="firefox", proxy=None, timeout=600, headless=True): """ Scrapes Facebook data for the given URLs.

- urls: List of website URLs
- posts_count: Number of posts to scrape per page
- browser: Browser to use (e.g., "firefox")
- proxy: Proxy information (e.g., "IP:PORT" or None)
- timeout: Timeout in seconds
- headless: Whether to run the browser in headless mode

- A dictionary containing the scraped data for each page
page_names = []

for url in urls:
    # Extract Facebook page name from the URL
    page_name = extract_facebook_page_name(url)

    if page_name:

# Call the function to scrape Facebook data using extracted page names
result = scrape_facebook_data(page_names, posts_count, browser, proxy, timeout, headless)

return result

Set up logging configuration

Example usage:

if name == "main":

List of website URLs

urls = ['', '']

# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT"  # if proxy requires authentication then user:password@IP:PORT
timeout = 600  # 600 seconds
headless = True

# Dictionary to store scraped data
result = {}

for url in urls:
    # Extract Facebook page name from the URL
    page_name = extract_facebook_page_name(url)

    if page_name:
            # Call the function to scrape Facebook data for the current URL
            page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)

            if page_data:
                # Add the scraped data to the result dictionary
                print(f"No posts found for URL: {url}")
                continue  # Continue to the next URL if no posts are found

        except Exception as e:
            print(f"Error scraping data for URL '{url}': {e}")
            continue  # Continue to the next URL if an error occurs

        print(f"No Facebook page found for URL: {url}")
        continue  # Continue to the next URL if no Facebook page is found

# Check if result is empty and return None if it is
if not result:
    print("No Facebook data found for the provided URLs.")
    result = None

# Print the result
print(json.dumps(result, indent=2))
# List of website URLs
urls = ['', '']

# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT"  # if proxy requires authentication then user:password@IP:PORT
timeout = 600  # 600 seconds
headless = True

# Dictionary to store scraped data
result = {}

for url in urls:
    # Extract Facebook page name from the URL
    page_name = extract_facebook_page_name(url)

    if page_name:
            # Call the function to scrape Facebook data for the current URL
            page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)

            if page_data:
                # Add the scraped data to the result dictionary
                print(f"No posts found for URL: {url}")
                continue  # Continue to the next URL if no posts are found

        except Exception as e:
            print(f"Error scraping data for URL '{url}': {e}")
            continue  # Continue to the next URL if an error occurs

        print(f"No Facebook page found for URL: {url}")
        continue  # Continue to the next URL if no Facebook page is found

# Check if result is empty and return None if it is
if not result:
    print("No Facebook data found for the provided URLs.")
    result = None

# Print the result
print(json.dumps(result, indent=2))
# List of website URLs
urls = ['', '']

# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT"  # if proxy requires authentication then user:password@IP:PORT
timeout = 600  # 600 seconds
headless = True

# Dictionary to store scraped data
result = {}

for url in urls:
    # Extract Facebook page name from the URL
    page_name = extract_facebook_page_name(url)

    if page_name:
            # Call the function to scrape Facebook data for the current URL
            page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)

            if page_data:
                # Add the scraped data to the result dictionary
                print(f"No posts found for URL: {url}")
                continue  # Continue to the next URL if no posts are found

        except Exception as e:
            print(f"Error scraping data for URL '{url}': {e}")
            continue  # Continue to the next URL if an error occurs

        print(f"No Facebook page found for URL: {url}")
        continue  # Continue to the next URL if no Facebook page is found

# Check if result is empty and return None if it is
if not result:
    print("No Facebook data found for the provided URLs.")
    result = None

# Print the result
print(json.dumps(result, indent=2))
# List of website URLs
urls = ['', '']

# Common configuration for scraping
posts_count = 10
browser = "firefox"
proxy = "IP:PORT"  # if proxy requires authentication then user:password@IP:PORT
timeout = 600  # 600 seconds
headless = True

# Dictionary to store scraped data
result = {}

for url in urls:
    # Extract Facebook page name from the URL
    page_name = extract_facebook_page_name(url)

    if page_name:
            # Call the function to scrape Facebook data for the current URL
            page_data = scrape_facebook_data([page_name], posts_count, browser, proxy, timeout, headless)

            if page_data:
                # Add the scraped data to the result dictionary
                print(f"No posts found for URL: {url}")
                continue  # Continue to the next URL if no posts are found

        except Exception as e:
            print(f"Error scraping data for URL '{url}': {e}")
            continue  # Continue to the next URL if an error occurs

        print(f"No Facebook page found for URL: {url}")
        continue  # Continue to the next URL if no Facebook page is found

# Check if result is empty and return None if it is
if not result:
    print("No Facebook data found for the provided URLs.")
    result = None

# Print the result
print(json.dumps(result, indent=2))

**Seee i have using try except block but this code exists after checking for testmatick and not go to next url exists by saying critical no posts found**