pgaref / HTTP_Request_Randomizer

Proxying Python Requests
http://pgaref.com/blog/python-proxy/
MIT License
148 stars 60 forks source link

How to use the same proxy for multiple URL requests? #57

Closed windowshopr closed 4 years ago

windowshopr commented 4 years ago

Not really an issue, but would love some input as I can't seem to figure out how to make it work.

My code (pseudo) looks something like this:

    req_proxy = RequestProxy()

    url_list = [www.example1.com, www.example2.com, www.example3.com, www.example4.com]

    for url in url_list:

        while True:

            request = req_proxy.generate_proxied_request(url)

            if request is not None:

                (THE REST OF MY CODE IS HERE ONCE WE GET A GOOD RESPONDING PROXY)

                break # Break out of While loop if we got a good response
            continue # Move on to the next url in the for loop

I'm wondering if there's a way to use the same proxy for say, 2 items in my url_list before trying to request another one. In my main application I have a long list of url's and would like to re-use some good responding proxies for multiple url's before making a new request. How could I go about structuring this? Thanks a lot! (Or if there's documentation that I missed, point me in the right direction. Thanks!)

pgaref commented 4 years ago

Hello @windowshopr

There is actually a sustain flag in the RequestProxy constructor that reuses the latest proxy as long as it does not timeout. Just use RequestProxy(sustain=True) to test it.

Cheers

windowshopr commented 4 years ago

Right on! I will work that in to my code. Thanks a lot!

windowshopr commented 4 years ago

@pgaref Thanks for the help, adding in the sustain=True worked, however I'm running into an issue where the script works for about 5 or 6 proxies, and then gets stuck:

2020-01-07 20:24:25,155 root DEBUG Using proxy: 45.76.43.163:8080 | PremProxy

So it's trying to to use the above proxy, but it doesn't seem to move on if it can't get a response from the proxy. So I hit my CTRL + C to stop the script and I get a traceback that looks like:

Traceback (most recent call last):
  File "C:\Users\...\Python36\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\...\Python36\lib\site-packages\urllib3\contrib\pyopenssl.py", line 280, in recv_into
    return self.connection.recv_into(*args, **kwargs)
  File "C:\Users\...\Python36\lib\site-packages\OpenSSL\SSL.py", line 1814, in recv_into
    self._raise_ssl_error(self._ssl, result)
  File "C:\Users\...\Python36\lib\site-packages\OpenSSL\SSL.py", line 1614, in _raise_ssl_error
    raise WantReadError()
OpenSSL.SSL.WantReadError

Any ideas on that? It only seemed to do this after I added in the sustain. Thanks!

pgaref commented 4 years ago

Hey @windowshopr

The buffer response issue seems to be just a weird traceback introduced by Python 3 and not the real issue (check link) -- it seems more like an SSL issue with the particular proxy. I would expect the issue to be as easy to fix as handling the Error raised by http request -- happy to help if you narrow it down.

Cheers

windowshopr commented 4 years ago

@pgaref

Thanks a lot. I think it might just be something weird happening on my end you're right. It doesn't seem to do it every time, so I'll play with some exception handling/may just implement a quick timer/while loop to force it to move on for now.

Thanks!