kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.27k stars 611 forks source link

Use with Proxies? #538

Open brandon-scott-pritchard opened 2 years ago

brandon-scott-pritchard commented 2 years ago

This program works great, but I'm trying to cycle through proxies when sending the requests, and I can't get the program to play nice with the proxy cycle.

Is there a way built in to cycle through proxies?

neon-ninja commented 2 years ago

No, but it shouldn't be too hard to do. Catch the TemporarilyBanned exception and call set_proxy in response.

brandon-scott-pritchard commented 2 years ago

I'm not actually getting a TemporarilyBanned exception; I'm getting the following:

facebook_scraper.exceptions.UnexpectedResponse: Your request couldn't be processed

I can't figure out what it is, so I figured I'd try a proxy to see if that could possibly help it.

neon-ninja commented 2 years ago

What code are you using that generates that exception?

brandon-scott-pritchard commented 2 years ago

import os from facebook_scraper import * from pprint import pprint from proxies import getProxy

businesses = ['bpattorney', 'UtahDivorceAttorneyDavidPedrazas', 'Bobby-Dale-Barina-Attorney-at-Law-271503059621431', 'FamilyLawRights', 'diyfamilylaw', 'salcidolaw', 'GilstrapLawoffice', 'OneilWysocki', 'LatinaLawyer', 'putmanlaw', 'JerseyCityDivorceLawyer', 'leannetownsendlife', 'caflawgroup']

allPosts = {}

for business in businesses: pprint(get_page_info(business)) businessInfo = get_page_info(business) try: for post in get_posts(business, pages=3): postInfo = post postInfoSorted = {'username':postInfo['username'],'postText':postInfo['post_text'],'likes':str(postInfo['likes']),'reactions':str(postInfo['reactions']),'comments':str(postInfo['comments']),'shares':str(postInfo['shares']),'image':str(postInfo['image']),'numberImages':str(len(postInfo['images'])),'video':postInfo['video'],'liveVideo':postInfo['is_live'],'postURL':postInfo['post_url'],'pageFollowers':str(businessInfo['followers']),'pageName':businessInfo['name']} allPosts[len(allPosts)] = postInfoSorted print(business+" page is complete") except: print(business+' page scrape failed.') postInfoSorted = {'username':'scrape failed','postText':'scrape failed','likes':'scrape failed','reactions':'scrape failed','comments':'scrape failed','shares':'scrape failed','image':'scrape failed','numberImages':'scrape failed','video':'scrape failed','liveVideo':'scrape failed','postURL':'scrape failed','pageFollowers':'scrape failed','pageName':business} allPosts[len(allPosts)] = postInfoSorted continue

with open('postBreakdown.csv', mode='w') as outfile: outfile.write('Username'+'\t'+'Post Text'+'\t'+'Likes'+'\t'+'Other Reactions'+'\t'+'Comments'+'\t'+'Shares'+'\t'+'Image'+'\t'+'# of Images'+'\t'+'Video'+'\t'+'Live Video?'+'\t'+'Post URL'+'\t'+'Page Followers'+'\t'+'Page Name'+'\n') for item in allPosts: outfile.write(allPosts[item]['username']+'\t'+allPosts[item]['postText']+'\t'+allPosts[item]['likes']+'\t'+allPosts[item]['reactions']+'\t'+allPosts[item]['comments']+'\t'+allPosts[item]['shares']+'\t'+allPosts[item]['image']+'\t'+allPosts[item]['numberImages']+'\t'+allPosts[item]['video']+'\t'+allPosts[item]['liveVideo']+'\t'+allPosts[item]['postURL']+'\t'+allPosts[item]['pageFollowers']+'\t'+allPosts[item]['pageName']+'\n') print('File write complete')

neon-ninja commented 2 years ago

Try update to latest master

brandon-scott-pritchard commented 2 years ago

I updated yesterday; no change. Updated just now, also no change.

neon-ninja commented 2 years ago

I've just pushed a new version to PyPI, v0.2.49. Try update to v0.2.49.

brandon-scott-pritchard commented 2 years ago

No fix; however, I think the problem is that FB is only allowing so many page views before requiring login, so swapping IPs after each business page/after a certain number of page loads on a specific business page should take care of the problem, I think.

neon-ninja commented 2 years ago

You might think that, but typically when you use public proxies, you're using an IP shared by tens of thousands of other people, for all sorts of nefarious purposes, which look really suspicious to Facebook. It might be better to just pass cookies.

brandon-scott-pritchard commented 2 years ago

Ok, I'll do that. I literally. just pass in "cookies"?

neon-ninja commented 2 years ago

Yes