Closed kami4ka closed 2 years ago
I think this would make a lot of sense. Using residential proxies costs 250 credits per call, while datacenter proxies cost only 10 with ScrapingAnt. That gives you just above 1 call per day in the free plan and even with the 100k credit plan you end up with less than 14 calls per day. That makes using residential proxies unfeasible IMHO. I ended up downgrading to version 5.5.0 and adding some retries, which works fine.
@denisalevi could you, please, create a PR to this repo?
Hi @kami4ka, I'm not a js developer and its just some hacky lines added to an older version of the repo. I'll try to clean it up or make it somehow available soon!
But from a quick look at the current repo version, it looks like it is all there in requestDriver.js. From a quick look I didn't get the logic entirely though. Could it be that something is missing there? An option to set MAX_RETRIES_SCRAPING_ANT
and not try residential proxies should be possible?
@orangecoding Any thoughts? :)
I'm going to give it a shot in a couple of days, currently I have some private responsibilities I have to deal with. Once this is sorted, I'm coming back to this :)
@kami4ka If I understand you correctly, you suggest to make the use of residental proxies an option and let the user decide whether they want to have a faster and more stable solution or whether they want to have a cheaper one?
I do like this approach tbh
@orangecoding Yup. Exactly. So retry mechanism would remain the same, only the proxy type is changeable
I would maybe add an option to set the number of retries when using residential proxies? I think with 3 retries (current setting?) it fails quite often. I have 8 retries and I think it always succeeds with those. I can check the logs again next week, its running on a Pi that I don't have access to right now.
Yeah. I guess it's also can be a proxy-type independent option.
@kami4ka what was again the comment about retries.. I remeber you once told me that if the return value is != 200 meaning if no success, the customer is not charged. Is this still true? If this is true, I don't know if it makes sense to make the number of retries configurable, but rather setting it to let's say 10. Of course only if this doesn't cost 250 credits per retry ;)
I have just added an option to configure the proxies (still to be finished) and once again noticed... I am no designer at all.. 👯
@kami4ka what was again the comment about retries.. I remeber you once told me that if the return value is != 200 meaning if no success, the customer is not charged. Is this still true? If this is true, I don't know if it makes sense to make the number of retries configurable, but rather setting it to let's say 10. Of course only if this doesn't cost 250 credits per retry ;)
Yup. That's true. Every non-200 response from ScrapingAnt is not billable.
I have just added an option to configure the proxies (still to be finished) and once again noticed... I am no designer at all.. 👯
10.000 free API credits :-)
Ahh right. Thanks.
@denisalevi @kami4ka I have added all necessary changes, would you mind taking a look and do a quick review? https://github.com/orangecoding/fredy/pull/59/files
It's pretty straight forward:
Great, thanks a lot! I'll try it out tonight :)
Is your feature request related to a problem? Please describe. Datacenter scraping for Immobilienscout24 is successful too but may require more retries and a bit slower, while residential is faster and more expensive.
Describe the solution you'd like Allow 2 strategies for Immobilienscout24 scraping: 1) Datacenter-only - retry N times with datacenter proxies (note: also retry when the status code is 404, it's a known behavior for this specific proxy pool) 2) Residential-included - try with datacenter (better with retries) and then switch to residential
So it would be possible to decide whether use residential or not, but the retry would always apply.
Additional context Datacenter-only approach would always return a successful result, but it might take some time, while the residential-included approach would be faster and more expensive. It's a result of ScrapingAnt's custom proxy pool feature applied for Fredy.