redcap-tools / PyCap

REDCap in Python
http://redcap-tools.github.io/PyCap/
MIT License
168 stars 80 forks source link

Support Setup of Proxy #249

Closed ugGit closed 1 year ago

ugGit commented 1 year ago

Hi there,

First of all, thanks for maintaining this interface! It's working smooth so far, but I'm encountering a limitation since I need to access the redcap instance from behind a proxy. Thus, I'd like to suggest the feature of defining proxies through which the API calls should be sent.

Goal

Define a proxy is through which all requests to the API should go.

Suggested Approach

Pass in a proxy parameter during the initialization of a redcap object (e.g., a Project) and store the information for all subsequent requests.

Conducted Code Research

After briefly scrolling through the code, I expect that the required changes should be minimal. This is mainly based on the fact that the Session.request() method already expects a parameter called proxies. However, I'm not quite sure what would be the best way to pass down this information.

I'm open to discuss this request further, and in case of sufficient guiding implementing myself as well :)

pwildenhain commented 1 year ago

Hey there :wave: thanks for submitting this issue!

Questions like this make me wonder if we should just add a **request_kwargs attribute to the Base object (which Project inherits from). This would allow users to pass through whatever they want in the request, in addition to what PyCap does for them by default.

They wouldn't be able to override the url, data, file, or verify parameter -- but there are other options such as allow_redirects, timeout, proxies, etc. that users might want to tinker with (as in your use case)

It would be great if could tackle this. I see the basic steps as:

  1. Add a **request_kwargs attribute to Base
  2. Pass those through to the the Base._call_api method
  3. Lastly pass through to the actual API call in the _RCRequest.execute method

I would test it out to see what happens if the user tries to pass something like verify through **request_kwargs, which we already map directly into the API call. If it does throw an error, then we should probably check the request_kwargs to ensure that they don't contain any of the arguments that we're hard coding (url, data, file, and verify) and throw a ValueError Exception to tell them that they can't use those.

How does that sound? Want to take a crack at it? We'll worry about what tests to add to the test suite until after the work is complete.

In the meantime, it looks like you can use environment variables to make requests from behind a proxy, so that you could do what you need to do until this change is merged.

ugGit commented 1 year ago

Alright. I briefly did the changes and added the validation for all kwargs that might interfere on the way of passing it to the actual request execution call. In case that you think this is useful, we can continue working on this feature, but again, I'd need some instructions for adding the tests.

However, I noticed that I'd even have to use a ProxyManager as I need to define proxy_headers as well :( Upon a first inspection, I don't see an easy way to make the Session use such a ProxyManager. In case that you have a free minute, I'd love to hear your opinion about this issue.

pwildenhain commented 1 year ago

Thanks for the contribution! Code is beautiful 🤩 Let's get it working for you

With just using the requests package to make the API call to REDCap, does this method work for you?

If it works, then I think we can also add an http_adapter argument as well, to allow folks to pass their own adapters, which we can mount in the session that PyCap creates

https://stackoverflow.com/a/58440941/11381493

Additional documentation: https://requests.readthedocs.io/en/latest/user/advanced/#transport-adapters

ugGit commented 1 year ago

Thanks for the hints!

I guess, it works using the requests package, respectively urllib3 from which the HTTPAdapter originates anyways according to the link you've posted about transport adapters. This is assumption is based on the fact, that I received an example to perform http requests using urlib3 and ProxyManager. The example looks like this:

from urllib3._collections import HTTPHeaderDict
from urllib3 import ProxyManager, make_headers

## Proxy server url
proxyURL = "http://a-real-proxy-com:8080"

## creating headers
headers = HTTPHeaderDict()
proxyHeaders = make_headers(
    proxy_basic_auth = user_name + ':' + password
    )

# There is no real method here - this was done by trial & error, as so many things have to be done due to missing documentation
headers.add('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36') ### telling the proxy it is chrome
headers.add("Accept", "application/json")
headers.add("Accept-Encoding", "gzip, deflate, br")
headers.add("Connection", "keep-alive")
headers.add("proxy-agent", "Fortinet-Proxy/1.0")
headers.add("authorization", user_name + ':' + password)

## getting everything together in gthe proxy manager
p = ProxyManager(
    proxyURL,
    headers=headers,
    proxy_headers=proxyHeaders
    )

p.request('GET', 'https://www.google.com') # works!

So far, I've tried to setup a trivial example based on your suggested stack overflow post, but I don't manage to even check that the function proxy_headers() is called. Here the corresponding code:

proxyURL = "http://a-real-proxy.com:8080"
proxyHeaders = {proxyURL: {}}

class ProxyHeaderAwareHTTPAdapter(requests.adapters.HTTPAdapter):
    def proxy_headers(self, proxy):
        print('called proxy_header()')
        if proxy in proxyHeaders:
            print(f'found the proxy: {proxy}')
            return proxyHeaders[proxy]
        else:
            return None

s = requests.Session()
s.mount('http://', ProxyHeaderAwareHTTPAdapter())
s.mount('https://', ProxyHeaderAwareHTTPAdapter())

s.get("https://google.com/", headers={
        "proxy": proxyURL
    })

However, I have very basic experience with proxies and also lack the exact documentation about the configuration about the one I have to use. Thus, I'll try to gather together some information and make further tests over the next days.

pwildenhain commented 1 year ago

Got it -- it's not immediately obvious to me what could be going on with your code either 🤷🏻 I'm also a total noob when it comes to proxies

One tangentially related thought I had is that when you get this working, we should only s.mount the url for the REDCap project, not all http(s):// domains. Seems like the safer bet.

Happy debugging! 🔎 🐛

ugGit commented 1 year ago

Well, well, well. A solid 2 months later I have the pleasure to reach the conclusion that your initially suggested, and already in the PR implemented, changes are sufficient!

After a slow back and forth with my IT departement, I received the information about the proxy configurations requried. Luckily, I can set them all using the kwargs attribute that is now added to the request. Thus, no more changes are required to support my use case.

@pwildenhain, If you are fine with the changes, they could be merged!

ugGit commented 1 year ago

Thanks for your help @pwildenhain !

pwildenhain commented 1 year ago

Of course, and thank you for your contribution, I'll make sure you're credited in the next release