What do we use for validation files here?

maumlima / Rivers

River model for CliMA.

5 stars 2 forks source link

What do we use for validation files here? #2

Closed odunbar closed 2 months ago

odunbar commented 3 months ago

https://github.com/maumlima/Rivers/blob/9895f8cf545d9c6cf0331fc7ba435acbfa6bb6d7/examples/catchment_models/training/usa_basin_split.yml#L8

odunbar commented 3 months ago

In the data we have on Sampo - we have only have one basin file. Would we call the same file for all of these and the config describes how they are split elsewhere?

maumlima commented 3 months ago

hey! If you want to use all the config files inside this folder, you should need six basin lists. These six files should be: globe_basin_split_train_list.txt, globe_basin_split_test_list.txt, globe_time_split_list.txt, usa_basin_split_train_list.txt, usa_basin_split_test_list.txt and usa_time_split_list.txt. These names should be a good indication on where to use each of them!

If you want to generate this yourself, you can use extract_basin_lists() (step 1.8. of the README inside examples/catchment_models/).
If you want to get this data from sampo, the six files are all there (at least on my part).

odunbar commented 2 months ago

Yup I saw this, the general files I do have.

My Q is that, for example in the basin split experiments there is no ..._validation_list.txt in the basin experiment. When I looked again I did see we have new validation time however, so for basin split experiments do we do hyperparameter training on the training basins at different times, rather than on some set of new validation-specific basins?

maumlima commented 2 months ago

Yes. The way it's currently done saves more basins to train (rather than splitting the number of basins again into train/validation). A possible improvement would be to use cross-validation.