googleapis / google-api-python-client

🐍 The official Python client library for Google's discovery based APIs.
https://googleapis.github.io/google-api-python-client/docs/
Apache License 2.0
7.64k stars 2.39k forks source link

batch requests never use mTLS URLs #2440

Open jay0lee opened 1 month ago

jay0lee commented 1 month ago

At: https://github.com/googleapis/google-api-python-client/blob/main/googleapiclient/discovery.py#L1495-L1497

we build batch_uri based off the hard-coded rootUrl discovery doc parameter. We do this even if mTLS endpoint was previously configured on build() thus using batch methods never actually uses the mTLS endpoint.

jay0lee commented 1 month ago

Can someone help me understand why all of the mTLS cert logic in build_from_document() is within the:

if http is None:

logic right here? https://github.com/jay0lee/google-api-python-client/blob/375e4397b1d4894dc9f97c2e6b7113c8e00222ab/googleapiclient/discovery.py#L592

It means that if you pass an http object into build() / build_from_document() mTLS will never be enabled. I don't think we document that anywhere. At first glance I can't see any issues pulling all of that logic starting at:

        # Obtain client cert and create mTLS http channel if cert exists.
        client_cert_to_use = None
        use_client_cert = os.getenv(GOOGLE_API_USE_CLIENT_CERTIFICATE, "false")
...

out of the http is None statement.

jay0lee commented 1 month ago

Here's a simple script that demonstrates this issue. The script should list the latest versions of Chrome across OS platforms. It makes one non-batch call to the API to get a list of platforms and then it makes a batch call retrieving the latest release of Chrome per-platform.

I've included a sample client key and certificate that are necessary for mTLS, just put them in the same folder as the script. You need to remove the .txt suffix from each file, it was needed to make GitHub happy. simple-test.py.txt client1.pem.txt client1.key.txt

You should notice that with the current googleapiclient, the first call to retrieve platforms will use mTLS while the batch call will use the non-mTLS endpoint.

Also, if you uncomment the http=httpc, line, you'll notice that when we pass our own http object to build() mTLS isn't used at all.

vchudnov-g commented 1 month ago

Could you comment on the impact of this deficiency? Is it blocking your work. That information can help us prioritize looking into this.

jay0lee commented 1 month ago

Yes, I'm working on adding mTLS client certificates and Chrome Enterprise Premium / Context Aware Access rules to restrict Google service access to a select list of client certificates, thus the need for mTLS to function properly. My app is already using the batch endpoint.

https://cloud.google.com/policy-intelligence/docs/role-recommendations-overview

vchudnov-g commented 1 month ago

@clundin25 @westarle @parthea Thoughts on this approach?

clundin25 commented 1 month ago

There is active work for mTLS but I don't believe anyone is looking at the discovery SDK.

When using a custom HTTP object, could you directly load the mTLS credentials? It seems that the intent for the check is to avoid making any assumptions about the transport that was passed in.

jay0lee commented 1 month ago

@clundin25 that makes sense. Do you think it'd make sense to keep all the client cert selection code which leads up to and includes:

http_channel.add_certificate(key_path, cert_path, "", passphrase)

within the if http == None block but pull the root URL selection code out of there so it still applies when a custom http object is passed in?

clundin25 commented 1 month ago

That's a good question. I am going to reach out to my colleagues and see if they remember the context for this flow.

IIRC it is safe to use mTLS endpoints without actually performing mTLS in the handshake so I don't think such a change would necessarily break any code.

As a workaround, would setting the api_endpoint client option be a reasonable interim solution? https://github.com/googleapis/google-api-python-client/blob/0cb72664d833c4c8e05021a12659f14bc916fea5/googleapiclient/discovery.py#L691C54-L691C66.

jay0lee commented 1 month ago

Coming back to this after a couple weeks off. On further consideration I do think if a custom http object is being passed to build_from_document() we shouldn't be trying to set the client certificate on it. As you said, the user can do that themselves when building the custom object.

I do think it's worth us performing the mtlsRootUrl logic even when a custom http object is passed in though as that is specific to our discovery document and requiring the caller to know the mTLS URL ahead of time defeats the purpose of discovery...

To answer your question, attempting to send mTLS client certificates to a web server that does not support/request mTLS generally has no impact and the web request shouldn't fail due to that setting. Attempting to contact a mTLS web server such as www.mtls.googleapis.com WITHOUT sending mTLS client certificate will fail. Generally though anyone using the library who has setup these environmental parameters that the code depends on is clearly indicating they want mTLS to work so I think there's generally low risk of breaking anything with a change here.

I've updated my pull request here: https://github.com/googleapis/google-api-python-client/pull/2442

so that only the mTLS URL logic now gets executed if a custom http object is passed to build_from_document(), we won't do anything with client certificate files. I'm also adding an is_mtls bool to the Resource object and using that to correctly set the batch URL (the OP issue).

Can someone review this change?