Closed phil-hawkins closed 7 months ago
@phil-hawkins The connector already supports external browser. Does it solve your problem? See this API doc. https://docs.snowflake.com/en/user-guide/python-connector-api
Is the goal of that to be able to connect to Snowflake using EXTERNALBROWSER authentication from a Docker container? From the description, it seems you are able to do it already even if it doesn't work all the time, would it be possible to know more about it please @phil-hawkins ?
Edit: never mind I managed to do it like that https://stackoverflow.com/questions/67325900/is-it-possible-to-use-externalbrowser-authenticator-inside-docker-container-fo
@phil-hawkins would you still be interested in creating a PR for this? Our team is happy to review it
@sfc-gh-aalam I've got a couple fixes for this issue and am preparing a PR. More info today or tomorrow
When running in a containerized environment, redirects issued by the host browser to the localhost:{port}
callback URL:
I've tried many approaches to make this handling more robust and combining a few together allows the code to work very reliably for my team. These issues were seen on macOS running a devcontainer
orchestrated via VS Code and also on windows running WSL.
SO_REUSE_ADDR
and SO_REUSEPORT
flagsI tried @phil-hawkins of implementing SO_REUSE_ADDR
and I also experimented with the similar SO_REUSE_PORT
flag. I may have seen an improvement when I did so, but it did not solve the issue for me and I continued to pursue other fixes.
Recommendation: Do not proceed with this fix
However if someone (@phil-hawkins?) can confirm this fixes any issues for them, I would be happy to finish coding it.
GET /favicon
One of the failures that would lead to an IDP auth failed message was due to the browsers occasionally sending GET /favicon
requests from the Your identity was confirmed
html page. I was able to prevent this behavior by adding this to the HTML body:
<link rel="icon" href="data:,">
I followed the approached outlined [here].(https://webdesign.tutsplus.com/prevent-automatic-favicon-requests--cms-34762t)
A favicon request could break the callback handling in a non-containerized environment as well, so this seems like a good idea for all cases.
Recommendation: Proceed with this fix
I experimented with spawning a thread to start the socket.accept()
and socket.recv()
logic before we send the auth request to the IDP. I even played with artificially sleeping before sending the IDP request. All of this was to no avail. It does not seem that the race condition can be fixed by allowing docker more time to map ports.
Recommendation: Do not proceed with this fix
recv
I noticed that frequently the result of the socket.recv()
was being returned as zero bytes. Using wireshark on the HOST the GET /?token
request seems perfectly fine from the HOST browser to the HOST port. I'm not sure why this occurs, but something in the way the container proxying for port-forwarding is frequently results in this behavior. If you simply do another socket.accept()
and then a socket.recv
in this scenario the second socket.recv call succeeds with the GET /?token
results. This zero-length byte array behavior was observed on both WSL and macos vs code devcontainers. The behavior was the same regardless of whether a port was statically mapped and reused or not. My proposed fix for this issue is that in the event of a zero length byte array, simply retry the socket.accept()
and socket.recv()
and limit retries to 5 attempts.
Recommendation: Proceed with this fix
socket.recv()
(WSL fix)In WSL an additional fix was needed. For some reason the call to socket.recv()
very frequently blocks and never returns.
Investigating the flags available on socket.recv()
revealed the MSG_DONTWAIT flag. If we pass this flag and the system determines it would block, the python socket library raises a BlockingIOError. Combined with the retry approach outlined above, this allowed WSL to consistently and reliably receive the callback. The idea is to catch the error and log, but let the retry logic attempt again since the bytearray is still set to zero bytes and the retry count has not been exceeded.
Recommendation: Proceed with this fix
I proposed the following fixes:
<link rel="icon" href="data:,">
)recv
by using MSG_DONTWAIT
flag and retrying on error@sfc-gh-aalam I can start working on this PR, but if you have any concerns or want to discuss please let me know.
@podung Thanks for such a detailed explanation. I don't have any concerns. Please proceed with the PR.
@sfc-gh-aalam
Couple updates:
I went ahead and added the original request by @phil-hawkins to optionally set the socket flag instructing the kernel to reuse ip address/ports. This may be useful for some and I did see a few times where my container took too long to dynamically open the port. Note: I chose to use the SO_REUSEPORT
flag instead of SO_REUSEADDR
because that seemed more appropriate based on my reading.
Although the MSG_DONTWAIT
flag seems to make WSL environments more stable, I was seeing it sometimes cause my other containerized environments (ubuntu on macos) fail more frequently. I would like to make this an opt-in feature as well.
Other than updating the Release Notes with the changes and outlining the environment variables and settings there, is there a place we keep canonical documentation that I should be updating?
Here's the branch I'm working on if you're interested in seeing what I have so far: https://github.com/snowflakedb/snowflake-connector-python/compare/main...podung:snowflake-connector-python:robust_containerized_externalbrowser
@sfc-gh-aalam PR submitted. I'm happy to change anything as you or any of the maintainers see fit, just let me know.
What is the current behaviour?
EXTERNALBROWSER authentication is the best option in the case of SSO with MFA. However, when performed inside a local container such as a VSCode docker devcontainer, there is a race condition between the auth callback to the localhost and starting the receiver app on the localhost and auto-forwarding the randomly assigned port. This can lead to an intermittent failure to connect.
One way to address this is to assign a fixed port number with the SF_AUTH_SOCKET_PORT environment variable and statically forward this port. This removes the delay in detecting and autoforwarding a randomly assigned port. However a new problem arises: the port remains in a TIME_WAIT state for a few minutes after it is used and subsequent connections fail with
OSError: [Errno 98] Address already in use
during this period.What is the desired behaviour?
EXTERNALBROWSER auth works every time from inside a local container.
An option is to set up a static port forwarding rule as above and set the
SO_REUSEADDR
state of the port, perhaps determined by the value of new environment variable such asSF_AUTH_SOCKET_REUSEADDR
.How would this improve
snowflake-connector-python
?EXTERNALBROWSER auth would work more seamlessly from local containers.
References, Other Background
socket(7) — Linux manual page SF_AUTH_SOCKET_PORT code
I'm up for doing a PR to address this.