superduper-io / superduper

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.
https://superduper.io
Apache License 2.0
4.7k stars 458 forks source link

[BUG]: Dask distributed cluster fail to runs datalayer computation because of mongodb's data_backend config #1323

Closed Tob-iee closed 10 months ago

Tob-iee commented 11 months ago

Contact Details [Optional]

No response

System Information

Notebooks

What happened?

The default configuration for mongodb's backend is set to "'mongodb://superduper:superduper@mongodb:27017/test_db" instead of 'mongodb://localhost:27017'

Steps to reproduce

1. 2. 3. ...

Relevant log output

No response

miko1ann commented 11 months ago

I got same issue. I think the problem is passing CFG to worker and initilisation within. @Tob-iee have you try configure CFG on daskworker with env or file?

Configurations can either be injected:

  • directly in Python using the superduperdb.CFG data class
  • in a YAML file: .superduperdb/config.yaml or
  • through environment variables starting with SUPERDUPERDB_:

source: https://docs.superduperdb.com/docs/docs/configuration/

Tob-iee commented 11 months ago

@miko1ann yes using environment variables starting with os.environ['SUPERDUPERDB_DATA_BACKEND'] = 'mongodb://localhost:27017' works though

jieguangzhou commented 10 months ago

Yes, we can use the environment variables to do this

The new document : https://docs.superduperdb.com/docs/docs/setup/configuration