TemporarilyBanned exception not being caught - Githubissues

kevinzg / facebook-scraper

Scrape Facebook public pages without an API key

MIT License

2.47k stars 635 forks source link

TemporarilyBanned exception not being caught #385

Open TowardMyth opened 3 years ago

TowardMyth commented 3 years ago

I am scraping some FB pages.

FB temporarily bans you if you scrape too fast, and facebook-scraper will throw a TemporarilyBanned exception, per here.

However, for some reason I'm unable to catch the TemporarilyBanned exception. The code below will continue executing - and not go to the Except block - even once TemporarilyBanned Exception is raised.

The code below is inspired from @neon-ninja 's examples here.

How can I catch this exception, so that my scraper can wait for ~30+ mins before rescraping? Thanks!

from facebook_scraper import *
import json, youtube_dl, time, facebook_scraper, logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

file_handler = logging.FileHandler('fb_debug.txt')
file_handler.setLevel(logging.DEBUG)
logger.addHandler(file_handler)
file_handler.setFormatter(formatter)

stream_handler = logging.StreamHandler()
logger.addHandler(stream_handler)
stream_handler.setFormatter(formatter)

# =======================
# Change variables here
user = 'Nintendo'
counter=1
start_url = ''

options_dict = {
  "posts_per_page": 200
}

# =======================
# Scrape Facebook for posts
temporary_banned_count = 0

while True:
  try:

    for post in get_posts(user, pages=None, cookies='cookies.json', extra_info=True,youtube_dl=True, options=options_dict, start_url=start_url):
      counter += 1
      logger.info(f'Pulling post #{counter}...')

      try:
        logger.info(f'Post #{counter} date: {post["time"].strftime("%Y-%m-%d %H:%M")}')

      except AttributeError as e:
        logger.info(f'Post #{counter} does not have a date!')

      # Write as json object to .txt
      with open('fb_post.txt', 'a') as f:
        f.write(json.dumps(post, indent=4, sort_keys=True, default=str))
        f.write('\n')
        temporary_banned_count = 0
    logger.info("Done scraping all posts")
    break

  except exceptions.TemporarilyBanned as e:
    temporary_banned_count += 1
    sleep_secs = 600 * temporary_banned_count
    logger.info(f"Temporarily banned, sleeping for {sleep_secs / 60} m ({sleep_secs} secs)")
    time.sleep(sleep_secs)

neon-ninja commented 3 years ago

How do you know you're TemporarilyBanned if the code continues executing?

TowardMyth commented 3 years ago

@neon-ninja The logger for facebook_scraper returns something like this:

2021-07-07 02:23:31,878 - facebook_scraper.extractors - ERROR - You’re Temporarily Blocked.

As well, I added a print statement immediately before this line. This line gets printed to the console.

neon-ninja commented 3 years ago

So one of the extract functions handles the error, but you're still able to fetch additional posts despite that? An individual extract function is considered non-critical, so handles exceptions gracefully. It's only if pagination threw an exception that it would be raised to your code.

TowardMyth commented 3 years ago

@neon-ninja Is there any way to throw an exception even on individual extract functions? Or put another way, is there a way so that whenever I run into a "temporarilybanned" exception (whether it's on an individual extract function or pagination), my script can pause for 10+ mins before restarting?

neon-ninja commented 3 years ago

Sure - try this https://github.com/kevinzg/facebook-scraper/commit/53c89a195e510874f7418171e8f423a6afa7b958

TowardMyth commented 3 years ago

@neon-ninja Thanks, your commit worked! A few more small questions:

My use case: I'm trying to collect ALL the FB posts for a particular user for archival purposes. So it's not acceptable to skip any posts. That is why I want to throw the "TemporarilyBanned" exception regardless of whether it's on individual extract function or pagination. Are there any other Exceptions/conditions/etc in your library that would inadvertently skip over scraping some posts / prevent me from collecting all posts?
Do you have any tip to avoid getting temporarily banned by FB, ex: having some time.sleep() functions in between calls, etc?

neon-ninja commented 3 years ago

Note that individual extract functions only extract parts of posts - for example, extracting high quality images in an image post. The post would still have been returned, just potentially missing high quality images. I'm not aware of any current bugs that would cause a post to be skipped.
In general, the fewer requests you make, or the slower you make them, the less likely you are to be temp banned

TowardMyth commented 3 years ago

Have you been able to extract all the posts from a page with O(1k), O(10k), O(100k) posts before?

neon-ninja commented 3 years ago

Yes, I've done several CSV exports in the order of 1-5K per account. In https://github.com/kevinzg/facebook-scraper/issues/285, I extracted 14201 posts in 910s.

woodayssolutions commented 1 year ago

How do you know you're TemporarilyBanned if the code continues executing?

Hi Neon -ninja,

I m trying to scrap user posts for around 10 k users but facebook is temporarily blocking my account .Can you please suggest what can be the idea way to handle this.