zenml-io / mlstacks

A series of Terraform based recipes to provision popular MLOps stacks on the cloud.
https://mlstacks.zenml.io/
Apache License 2.0
247 stars 32 forks source link

Failed connections to analytics API block CLI indefinitely #130

Closed AdrianoKF closed 7 months ago

AdrianoKF commented 8 months ago

I have encountered a problem in the mlstacks CLI, which will be blocked from terminating indefinitely in case the Segment analytics API is unreachable (full disclosure: I run PiHole on my local network, which blackholes DNS requests to the api.segment.io API server - however, the same behavior would occur with "real" network connectivity issues). The hang occurs after the main operations have concluded successfully (in my case, deploying a stack).

Setting MLSTACKS_ANALYTICS_OPT_IN=0 when invoking the mlstacks CLI resolves the issue (as does disabling PiHole, thereby allowing the DNS resolution of the API endpoint). However, since analytics are arguably a secondary function of the CLI, a failure there should not affect the main functionality.

The hang occurs in the segment.analytics.client.Client.join() method (executed upon interpreter exit), which in turn waits for its Consumer thread to terminate (which seems to be the root cause). I'm unsure if there is anything mlstacks can do to prevent that issue (short of substituting the atexit handler registered by the Segment Client). Still, I wanted to raise the issue here nonetheless as it negatively impacts core functionality in certain environments.

Happy to help out in case you need further steps to reproduce!

As an aside: I find the naming of the environment variable controlling analytics a bit misleading: in fact, it is an opt-out rather than an opt-in since the default behavior is to enable analytics unless explicitly disabled.

strickvl commented 8 months ago

Thanks for reporting this! I am able to reproduce it and will work on a fix.

marwan37 commented 7 months ago

Hello! I'm currently working on a fix/workaround for this issue; aiming to submit a PR later today.

strickvl commented 7 months ago

@AdrianoKF this has now been fixed thanks to @marwan37 's PR. If you use the 0.8.1 release you'll have this fix in action. Thanks for reporting and thanks again @marwan37 for the fix! Closing this issue now.