Closed goktugerce closed 7 years ago
same problem here. I discovered that when I make more than one requests to the locked site the response don't contain the incapsula token, just a blank page like:
In [5]: response.content
Out[5]: '<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe src="/_Incapsula_Resource?CWUDNSAI=9&xinfo=8-334279410-0 0NNN RT(1477053474013 6) q(0 -1 -1 -1) r(0 -1) B12(4,315,0) U19&incident_id=133001920800788328-1938568076429954056&edet=12&cinfo=04000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 133001920800788328-1938568076429954056</iframe></body></html>'
But if I use a proxy, his send me again another token and incapsula_cracker can unlock.
I suggest to implement an exception to this case.
Same problem, in my particular case I don't have any problems with my local tests, but in my production site I get this error. Any suggestions?
This version is now outdated, Please see here for a version which works with py2.7 and py3. I have tested the new version with whoscored.com and it works.
Thank you
do u have incapsula cracker for Java? thank you
Sorry mate. It's been years since I've messed with Java. If you know of a good HTML parser and http/s requests library which can store session data, I would be glad to muck about making one.
I using jsoup for web scraping in java,but couldnt get pass the incapsula, dont know if it is possible to make the cracker with jsoup
It doesn't look like it's possible simply with jsoup. I'll see what I can do (probably not soon though) using a combination of jsoup and Apache HTTPClient. Incapsula just uses a simple cookie to "verify" that you're using a browser so if you can set that cookie before every request, it will get you through most of the checks.
Though be aware that even if you get past the simple check, if you're scraping too fast, they will just simply serve a recaptcha and there's no easy way to get around that as far as I know.
A few tips though:
Assumptions:
incap_sess_id
. I can't quite remember the full name.___utmvc
and send that out with your requests. I have not actually tried this, but if I recall correctly there is no request unique data so you should be able to reuse the same cookie value. AGAIN the above tips are based on assumptions that I've made and from my experiences in the past and may not hold true to current or future versions of incapsula.
Hope these tips help in the meantime!
Thank you for these tips man, appreciated.I think using persistent cookies will work if somehow we are able to use them consistently because websites like whoscored.com fooled by it sometimes once I participated answering the question, and sometimes it doesnt, I'm not really sure why, Incapsula seems not so stable after all. So if there is a way to crack it by using persistent cookies, that'll work I guess.
I am trying to scrape this link as an example, but getting errors. I installed the hotfix branch, tried the requests solution written in wiki but getting these:
In
from incapsula import crack
one:If I use IncapSession, I get this:
File "incapsula/requests_.py", line 65, in crack r = sess.get('{scheme}://{host}/_IncapsulaResource?{url_params}'.format(scheme=scheme, host=host, url_params=url_params), headers={'Referer': response.url}) File "incapsula/requests_.py", line 113, in get return crack(self, r) ... RuntimeError: maximum recursion depth exceeded in cmp
If I install module via
pip install incapsula-cracker
and try the first solution, I get this as response.text, which is what I should not get.