Open brandon-scott-pritchard opened 2 years ago
No, but it shouldn't be too hard to do. Catch the TemporarilyBanned exception and call set_proxy
in response.
I'm not actually getting a TemporarilyBanned exception; I'm getting the following:
facebook_scraper.exceptions.UnexpectedResponse: Your request couldn't be processed
I can't figure out what it is, so I figured I'd try a proxy to see if that could possibly help it.
What code are you using that generates that exception?
import os from facebook_scraper import * from pprint import pprint from proxies import getProxy
businesses = ['bpattorney', 'UtahDivorceAttorneyDavidPedrazas', 'Bobby-Dale-Barina-Attorney-at-Law-271503059621431', 'FamilyLawRights', 'diyfamilylaw', 'salcidolaw', 'GilstrapLawoffice', 'OneilWysocki', 'LatinaLawyer', 'putmanlaw', 'JerseyCityDivorceLawyer', 'leannetownsendlife', 'caflawgroup']
allPosts = {}
for business in businesses: pprint(get_page_info(business)) businessInfo = get_page_info(business) try: for post in get_posts(business, pages=3): postInfo = post postInfoSorted = {'username':postInfo['username'],'postText':postInfo['post_text'],'likes':str(postInfo['likes']),'reactions':str(postInfo['reactions']),'comments':str(postInfo['comments']),'shares':str(postInfo['shares']),'image':str(postInfo['image']),'numberImages':str(len(postInfo['images'])),'video':postInfo['video'],'liveVideo':postInfo['is_live'],'postURL':postInfo['post_url'],'pageFollowers':str(businessInfo['followers']),'pageName':businessInfo['name']} allPosts[len(allPosts)] = postInfoSorted print(business+" page is complete") except: print(business+' page scrape failed.') postInfoSorted = {'username':'scrape failed','postText':'scrape failed','likes':'scrape failed','reactions':'scrape failed','comments':'scrape failed','shares':'scrape failed','image':'scrape failed','numberImages':'scrape failed','video':'scrape failed','liveVideo':'scrape failed','postURL':'scrape failed','pageFollowers':'scrape failed','pageName':business} allPosts[len(allPosts)] = postInfoSorted continue
with open('postBreakdown.csv', mode='w') as outfile: outfile.write('Username'+'\t'+'Post Text'+'\t'+'Likes'+'\t'+'Other Reactions'+'\t'+'Comments'+'\t'+'Shares'+'\t'+'Image'+'\t'+'# of Images'+'\t'+'Video'+'\t'+'Live Video?'+'\t'+'Post URL'+'\t'+'Page Followers'+'\t'+'Page Name'+'\n') for item in allPosts: outfile.write(allPosts[item]['username']+'\t'+allPosts[item]['postText']+'\t'+allPosts[item]['likes']+'\t'+allPosts[item]['reactions']+'\t'+allPosts[item]['comments']+'\t'+allPosts[item]['shares']+'\t'+allPosts[item]['image']+'\t'+allPosts[item]['numberImages']+'\t'+allPosts[item]['video']+'\t'+allPosts[item]['liveVideo']+'\t'+allPosts[item]['postURL']+'\t'+allPosts[item]['pageFollowers']+'\t'+allPosts[item]['pageName']+'\n') print('File write complete')
Try update to latest master
I updated yesterday; no change. Updated just now, also no change.
I've just pushed a new version to PyPI, v0.2.49. Try update to v0.2.49.
No fix; however, I think the problem is that FB is only allowing so many page views before requiring login, so swapping IPs after each business page/after a certain number of page loads on a specific business page should take care of the problem, I think.
You might think that, but typically when you use public proxies, you're using an IP shared by tens of thousands of other people, for all sorts of nefarious purposes, which look really suspicious to Facebook. It might be better to just pass cookies.
Ok, I'll do that. I literally. just pass in "cookies"?
Yes
This program works great, but I'm trying to cycle through proxies when sending the requests, and I can't get the program to play nice with the proxy cycle.
Is there a way built in to cycle through proxies?