Open STRATZ-Ken opened 3 years ago
May be you can check the request headers send to Squid from Titanium and without Titanium to check if there is any headers that would indicate the original IP.
I did not see any headers being added by manual checking, also didnt see any headers when testing with a client, or even in the code.
So I dug a bit more. It seems like web hosts will limit the client once they see Selenium Chrome with --proxy-server=127.0.0.1:8000
. Really has little to do with Titanium.
So I guess I revamp my question, is it possible to traffic data to Titanium without using this feature of Chrome?
So if it's a request header added by selenium chrome, you can remove that header inside request handler on Titanium proxy.
I dont see it being a header. See these systems like Cloudflare are able to detect specific settings by running scripts on your machine. That is why they say "Wait 5 Seconds for us to detect". My guess is some hosts will do this as well and limit after so many requests a specific proxy setting.
Is there anyway to capture the HTTP traffic in my web app and force it through Titanium?
I am little unclear on what you meant by "they see Selenium Chrome with--proxy-server=127.0.0.1:8000
"
How do they see that? I assumed that information was added as a header to all requests by selenium driver.
Instead of telling selenium chrome to use Titanium proxy, may be you can set Titanium as system proxy. So all requests from chrome will go through the proxy automatically.
Hello all, first off great product. Well documented, well laid out. Just works.
I am trying to figure out an on-going issue which I cannot track down. This may not be a Titanium Web issue at all, but maybe I can be pointed in the right direction as I am running out of ideas.
My project is based on web scraping with Selenium C#. To avoid detection, I have alot of IPv4 address which I rotate at the firewall level. (If your interested in NAT Outbounding, check out PFsense). Basically every outgoing connection socket gets put on a random IP address.
I wrote a basic script in C# that calls https://api.ipify.org?format=json to verify this information. I can see that my IP is changing every request. Cool!
Now, when I start scraping obviously there are basic rate limits that is imposed based on IP. My program is based to start scraping with Chrome, with a proxy set to 127.0.0.1:8000 which is direct to Titanium Web Proxy. I then tested my IP again with IPIFY, worked. However, after 4 hours I got rate limited by the site.
So I attempted to open up Chrome manually and go to the site, it worked. Turned back on the program, still rate limited. Interesting?
My next attempt was proxy the traffic directly at the Windows 10 level and apply it on my machine. Visit the web site, worked. Went back to my application with Titanium Web Proxy, no go.
It is almost like something is happening with Titanium Web proxy that the other side can "see" and is blocking me. If I use my same app but proxy through my OS, I have no issue. The second I bring in Titanium Web Proxy and funnel through that, I am getting rate limited.
I then verified this theory last night, and kept Titanium off and ran my application for 14 hours with no issues.
I think its also important that I explain how my network is designed :
Program > Titanium Web Proxy > Squid Proxy > Firewall (Random IP) > Internet ---- This does not work Program > Squid Proxy > Firewall (Random IP) > Internet ---- This does work
I had to do this like this because I use Titanium's random Up Stream HTTP Proxy function. Some of the users of my application use 3rd party proxies and they get 100 Logins for 100 IP addresses. Your application is very easily configurable for this feature.
Any thoughts on what the issue might be with Titanium in this scenario?