iterative / ldb-resources

Apache License 2.0
28 stars 7 forks source link

`stage`/`instantiate` output folder requirement #17

Closed daavoo closed 2 years ago

daavoo commented 2 years ago

To elaborate on the section Requirement of empty folder from iterative/ldb#278 .

For the data-centric competition, I eventually need to instantiate the train and val splits in a format suitable for the training.

With the current behavior of stage/instantiate, I need to do:

$ cd competition
$ ldb stage ds:numerals-train train
$ cd train
$ ldb instantiate --format tensorflow-inferred
$ cd ..
$ rm -r train/.ldb_workspace  # Don't need it and I would need to update the training code to ignore the folder

And repeat the same steps for val split.

What I would like to do is to have a single .ldb_workspace at the top level of my project.

By having output folder as a requirement for instantiate instead of stage, I would be able to run:

$ ldb stage ds:numerals-train  
$ ldb instantiate competition/train  --format tensorflow-inferred  
$ ldb stag ds:numerals-val --force
$ ldb instantiate competition/val --format tensorflow-inferred
volkfox commented 2 years ago

This should be addressed by alowing INSTANTIATE to specify a folder https://github.com/iterative/ldb/pull/262

daavoo commented 2 years ago

Tried https://github.com/iterative/ldb/pull/262 , solved