snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
601 stars 473 forks source link

SNOW-974917: How can we use SF_AUTH_SOCKET_ADDR ? #1811

Open RobbertDM opened 1 year ago

RobbertDM commented 1 year ago

What is the current behavior?

The bigger context is that I am using gitpod, a cloud IDE, and I want to use externalbrowser authentication with Snowflake. This opens a SAML flow that always redirects to localhost, but I want it to redirect to my gitpod instance's URL, where the server is actually listening.

For the port, I can use this SF_AUTH_SOCKET_PORT variable and that works brilliantly:

However, for the hostname, if I try to set something like export SF_AUTH_SOCKET_ADDR='myworkspace.gitpod.io', then it complains that it cannot assign the requested address: [Errno 99] Cannot assign requested address

I guess socket.bind doesn't like us passing actual domain names.

https://github.com/snowflakedb/snowflake-connector-python/blob/9b6b0a63887e472e7c05365e5db896bf4d818db1/src/snowflake/connector/auth/webbrowser.py#L119-L123

So I wonder, how should we use this environment variable then? Is there any way to change the redirect URL to some public URL like myworkspace.gitpod.io?

What is the desired behavior?

How would this improve snowflake-connector-python?

Cloud IDE users would be able to use externalbrowser authentication.

References and other background

https://github.com/snowflakedb/snowflake-connector-python/blob/9b6b0a63887e472e7c05365e5db896bf4d818db1/src/snowflake/connector/auth/webbrowser.py#L119-L123

https://github.com/snowflakedb/snowflake-connector-python/blob/9b6b0a63887e472e7c05365e5db896bf4d818db1/src/snowflake/connector/auth/webbrowser.py#L401

sfc-gh-sfan commented 1 year ago

Thanks for reporting. Creating a new env var sounds like a good solution. I'm not sure if we have the bandwidth to prioritize this but we would be open to review PRs. Just in case I miss something, @sfc-gh-mkeller : WDYT? Are there some something that I might be missing?

RobbertDM commented 1 year ago

I would gladly open a PR, but I'm afraid this requires a change at snowflake side too: In the lines below, a POST request is made: https://github.com/snowflakedb/snowflake-connector-python/blob/9b6b0a63887e472e7c05365e5db896bf4d818db1/src/snowflake/connector/auth/webbrowser.py#L388-L411

Where body["data"]["BROWSER_MODE_REDIRECT_PORT"] = str(callback_port) is specified. If at Snowflake side, the server does not accept a BROWSER_MODE_REDIRECT_HOST or something similar, then I don't think any PR could enable setting the redirect URL.

sfc-gh-aling commented 8 months ago

hey @RobbertDM, trying to provide a potential workaround here, is it possible to get the IP of the domain and then pass it to SF_AUTH_SOCKET_ADDR to get your case work?

the IP could be retrieved via Python socket module:

import socket
ip_address = socket.gethostbyname('<domain_name>')
RobbertDM commented 8 months ago

Hey @sfc-gh-aling ,

No, I'm really convinced there is no simple workaround except for modifying the server at Snowflake side to accept a parameter like BROWSER_MODE_REDIRECT_HOST.

Even if we could make socket bind to whatever IP or host, the server would anyways return a redirect URL to localhost, because it does not know what else to redirect to. This SF_AUTH_SOCKET_ADDR is never propagated to Snowflake.


With 2 extra lines we can replicate what the snowflake connector will try to do:

import socket
ip_address = socket.gethostbyname('redacted-customersal-tnzk49f3eqt.ws.redacted.gitpod.cloud')
socket_connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket_connection.bind((ip_address, 0))

which will give you OSError: [Errno 99] Cannot assign requested address Because that public IP address returned by gethostbyname is not available on any of my interfaces.

What we could do instead is

import socket
socket_connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket_connection.bind((socket.gethostname(), 0))

where socket.gethostname() returns redacted-customersal-tnzk49f3eqt, which is known locally and resolves to 127.0.0.1.

Setting SF_AUTH_SOCKET_ADDR=redacted-customersal-tnzk49f3eqt therefore also does not give any errors, but, as expected, the redirect URL still redirect to localhost.

sfc-gh-aling commented 4 months ago

hi @RobbertDM , sorry for the delayed response.

I think I get what you mean by This SF_AUTH_SOCKET_ADDR is never propagated to Snowflake. -- I re-read the logic in the webbrowser.py, let me try explaining what happens during the webbrowser authentication preparation, and then see if we can come up with a solution for it.

what happens during webbrowser authentication preparation

  1. client first starts a local socket socket_connection with SF_AUTH_SOCKET_ADDR and SF_AUTH_SOCKET_PORT, this socket is used to receive the SAML token (code here).
  2. client sends a request to get sso url from snowflake via _get_sso_url method (code here)
  3. client opens the returned ssl with browser and then blocks until data from the socket_connection created in step 1 (code here)
  4. once SAML response is received from the socket_connection, clients extracts the token from the response.
  5. clients close the socket, and proceed the authentication against snowflake using the token (code here).

may I ask what's the role of gitpod here, is it a identity provider? which step(s) do you want to redirect and redirect from where to where? I assume you want to redirect the step4 SAML response to your gitpod instance? If so probably there is a way that we can inherit from the AuthByWebBrowser class and customize logic to perform redirect. if it's an identity provider there are some docs about setting up identity providers which might help: https://docs.snowflake.com/en/user-guide/admin-security-fed-auth-overview

also apologize that I have limited knowledge on how SAML is implemented in the backend (e.g., I'm not sure how SF_AUTH_SOCKET_PORT gets used to send the SAML response to the client), there could be things I overlooked and that need coordination from the server side.

RobbertDM commented 4 months ago

Hello @sfc-gh-aling , indeed, I think in your step 2 _get_sso_url, that gets an SSO URL returned by a Snowflake server. In that request, you can pass a BROWSER_MODE_REDIRECT_PORT, but not a BROWSER_MODE_REDIRECT_HOST. That's why I'm afraid it will not work by only implementing client-side changes. When you're in the SSO flow, you are following redirect URLs from Snowflake itself, not from localhost.

To answer your questions:

sfc-gh-aling commented 4 months ago

thanks for the information, I'm thinking whether the following customization of AuthByWebBrowser could be a potential solution here to manually forward the token to the gitpod instance:

class CustomizedAuthByWebBrowser(AuthByWebBrowser):
    def _process_receive_saml_token(self, conn: SnowflakeConnection, data: list[str], socket_client: socket.socket) -> None:
        super()._process_receive_saml_token(...)
        self._token
        # send token to the gitpod instance manually via a http request

another question I have in my mind is, let's suppose snowflake supports such a parameter BROWSER_MODE_REDIRECT_HOST. then what the SAML flow will be like? currently it's connector sending a request, getting back a url and opening it, and listening on a local port waiting for response. but if the response is redirected to gitpod, then how the connector should proceed. sorry I'm not expert on the SAML and trying to learn this.

I'm also trying to get server team engaged into this to see if they can provide some guidance.