optuna / optuna

A hyperparameter optimization framework
https://optuna.org
Other
10.07k stars 976 forks source link

Implement Delta Lake as storage backend #5350

Open pfwnicks opened 3 months ago

pfwnicks commented 3 months ago

Motivation

Delta Lake is a very performant and robust storage system with ACID (Atomic, Consistent, Isolated and Durable) transactions. Therefore it would be a good match and alternative to using an SQL table especially when computing in a parallel environment.

Description

It would be nice to have delta-rs: https://github.com/delta-io/delta-rs

integrated into the framework in some way or another. To start off, it could probably be achieved by implementing this in a similar way to the JournalFileStorage and JournalStorage implementation, but in the long run it would be nice to have this fully fledged into the storage backends available. Such that one could just provide a url perhaps with a local or remote path and some storage options to configure the local or remote path.

Alternatives (optional)

To start off with, it could of course be implemented as a similar implementation to JournalFileStorage, I am working on this already and will post some code when it is ready.

Additional context (optional)

https://delta-io.github.io/delta-rs/how-delta-lake-works/delta-lake-acid-transactions/

https://delta-io.github.io/delta-rs/

Ademord commented 2 months ago

@pfwnicks i come here to ask, since i am running into problems in a distributed tuning scenario, where sqlite is crashing im assuming because of shared access etc; have u tried "JournalFileStorage" and do u know if it helps? what would be the difference to Delta Lake