kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.28k stars 613 forks source link

[Help] Proxy and cursor #617

Open JJery-web opened 2 years ago

JJery-web commented 2 years ago

This project is so amazing. But I have two questions. (I am a beginner in learning coding)

  1. How can I add the proxy in the code? (request function) And is it helpful for the stable scraping?

2.How can I improve the code to continue scraping the data after I am temporarily blocked. (I have searched the issue before but because I am a beginner so I still don't know how to improve the code.) I just need to add the log and the cursor? But I don't know how to show the "log".

So sorry.


My code: for post in get_posts(account=link, pages=None, timeout=120, cookies="mycookies.json",options={"allow_extra_requests": False,"reactions":False,"posts_per_page": 200}):

neon-ninja commented 2 years ago
  1. You can use the set_proxy function. Probably not - if you use publicly available proxies, Facebook are more likely to require you to login
  2. See https://github.com/kevinzg/facebook-scraper/issues/310#issuecomment-852652846. You can use the enable_logging function to show the log.
johnoliver12 commented 2 years ago

Hi @neon-ninja , is there any way to avoid from temporarily blocking, I go through the related issues #621 , #310 and #577 , but couldn't get any possible solution. I tried with multiple cookies by considering that once blocked from one account, I can use the other one, but couldn't do so, I don't know why. How can I change/implement my code to avoid from blocking and need to scrap the content of given ids.

Any possible solutions?

neon-ninja commented 2 years ago

@johnoliver12 if you're getting temporary bans, you making too many requests, too fast. Try making fewer requests, or making them slower.

johnoliver12 commented 2 years ago

No other solutions? like changing IP address or by switching accounts etc? I am putting delays too, but not working.

Why proxies are used?

neon-ninja commented 2 years ago

In my experience, changing IP addresses is unlikely to help. Switching accounts is more likely to help. I wouldn't recommend using proxies.

johnoliver12 commented 2 years ago

Yeah I am agree with your point, but when an account is temporarily blocked, scrapper don't scraps with other accounts as well.

By using set(next(cookies)) it again raises the exception that account is temporarily blocked.

How can I resolve this issue that after blocking one account, scraper must work for others accounts? Please suggest any possible solution to avoid from blocking?

Thanks and Regards ,

neon-ninja commented 2 years ago

Post your code please

johnoliver12 commented 2 years ago

Here is the code


import json
from facebook_scraper import get_profile
import time
import random
import os
from facebook_scraper import set_cookies
import sys
import requests
from facebook_scraper import exceptions
from itertools import cycle

ini_prof= os.getcwd()
cookie_file =[]
filename= ini_prof + '\\input.txt'

f = open(filename).read().splitlines()
user_id = f[0] 

loc = ini_prof + '\\cookies\\'

account1 = loc + 'cookies1.txt'
account2 = loc + 'cookies2.txt'
account3 = loc + 'cookies3.txt'

cookie_file = cycle([account1 ,account2 , account3])
time.sleep(random.randint(3,8))

try:
    set_cookies(next(cookie_file))    
except (requests.exceptions.ConnectionError): 
    print("Connection Error")

except: 
     print("Exception Name:", sys.exc_info()[0])
     set_cookies(next(cookie_file)) 

while(True):  
    try:
        time.sleep(random.randint(3,8))
        print("getting profile")
        profile=get_profile(user_id , friends=True )
        time.sleep(random.randint(3,9))

        user_name= str(user_id)  
        filen=ini_prof +'\\searched_less\\' + user_name + '.json'  

        with open(filen, 'w') as file_object:
          json.dump(profile, file_object)    

    except: 
        print("Exception Name:", sys.exc_info()[0])    
        set_cookies(next(cookie_file))  

print("Done..")

I am reading the user IDs from a txt file, and cookies stored in my cookies directory. Scraper got block after random scraping (sometime after about 10 accounts, sometimes after 20 accounts). And when next cookies got set , it shows the temporarily block exception for all accounts.
neon-ninja commented 2 years ago

Extracting friends en mass results in temporary bans, really quickly. How many friends are you trying to extract? It may be better to use the get_friends function, as that would return a generator, that you consume at your desired pace. See https://github.com/kevinzg/facebook-scraper/issues/382 and https://github.com/kevinzg/facebook-scraper/issues/390

johnoliver12 commented 2 years ago

Thank you so much @neon-ninja for your such a kind response. I got that point of get_friends . But I am confused in understanding two points:

  1. How Facebook is blocking us means on the basis of IP address or on the basis of requests sent from a cookie profile?
  2. a. If blocking on the basis of too much requests, then why blockage occur on new account which is not used previously?
    b. If on the basis of IP address, why should not to use Proxies as you suggested.

Again Thank you so much for your precious time.

neon-ninja commented 2 years ago
  1. Sort of both, but mostly the latter
  2. See https://github.com/kevinzg/facebook-scraper/issues/409#issuecomment-907639417