Disentangle / De-Hardcode the data location from the code folder

initze / thaw-slump-segmentation

MIT License

10 stars 4 forks source link

Disentangle / De-Hardcode the data location from the code folder #6

Closed khdlr closed 2 years ago

khdlr commented 3 years ago

Currently, it's very tedious to switch from one dataset to a new one, one has to move/replace ~5 folders by hand to do that -- would it be a good idea to have the "data root folder" configurable instead of hard-coded?

We could add a data_root or data_folder argument to the config.yml, and have it default to ., so that the current behaviour isn't changed, but one could quickly switch to a new dataset (e.g. when running inference on a new dataset, or for debugging purposes)

khdlr commented 3 years ago

Or we could restructure the whole data stuff to be in a subfolder, like this:

.
├── data
│   ├── aux
│   ├── backup
│   ├── h5
│   ├── input
│   └── raw
├── train.py
└── etc...

initze commented 2 years ago

sounds actually lke a good idea, might be a good thing for the next major update (0.6 or 0.7)

We may think about having the option to choose locations manually (override defaults)

khdlr commented 2 years ago

Agreed. My suggestion would be to have a configureable data root that defaults to "data/" in the project folder.

Within this directory we can then freely create directories for aux/backup/h5, etc.

initze commented 2 years ago

introduced --data_dir argument for setup_raw_data.py defaults to "data"

using --config overrides --data_dir

initze commented 2 years ago

After some testing I think the config.yml is a bit uncomfortable imo. I rather prefer the --data_dir flag which we can apply in each of the 3/4 steps. when we set a default of "data" the we have a generic structure, but can adapt that. I think having one line in the config file is too "hidden"

initze commented 2 years ago

fixed with #25