Core idea: add a dataset sub-table to top-level tabels [vak.train], [vak.eval], etc., in the config file
It will look like this:
[vak.train.dataset]
name = "BuiltInDataset" # optional. Should exactly match class name
path = "~/path/to/data/on/my/computer" # required
splits_path = "~/path/to/splits/I/made" # optional
params = { window_size = 2000 } # "optional" but required for some DataPipes
This assumes we have already done #685 and #345 (which I have--in progress now)
It allows the following:
Replace train_transform_params + val_transform_params with the params option in the dataset table -- in practice we only ever use these parameters for specifying window_size; by enforcing the idea that these are parameters of the dataset we move towards #724 where the built-in DataPipes have fixed transforms that they always use (and if you need more control then it's time to move to your own script)
specifying a splits_path that is different from the default path -- this way if you want to try a different split with your dataset, you don't need to remake the entire dataset; splits are just in the splits file (edit: as in #749)
Core idea: add a
dataset
sub-table to top-level tabels[vak.train]
,[vak.eval]
, etc., in the config fileIt will look like this:
This assumes we have already done #685 and #345 (which I have--in progress now)
It allows the following:
train_transform_params
+val_transform_params
with theparams
option in the dataset table -- in practice we only ever use these parameters for specifyingwindow_size
; by enforcing the idea that these are parameters of the dataset we move towards #724 where the built-in DataPipes have fixed transforms that they always use (and if you need more control then it's time to move to your own script)splits_path
that is different from the default path -- this way if you want to try a different split with your dataset, you don't need to remake the entire dataset; splits are just in the splits file (edit: as in #749)