thomashopkins32 / HuBMAP

Hacking the Human Vasculature (Kaggle Competition)
Apache License 2.0
0 stars 0 forks source link

Separate Training, Validation, and Testing data into different datasets #36

Closed thomashopkins32 closed 1 year ago

thomashopkins32 commented 1 year ago

To make the transforms easier to work with. We should pre-split the data into training, validation, and testing.

The testing data is a single image and is already split off. The training data needs to be randomly split and this split needs to be saved somewhere.

This is required so that we can use no image transformations during validation and also get accurate class frequencies during training. If we use a single dataset and then do a split we run into the following issues:

We should implement TrainHuBMAP, ValidHuBMAP, and TestHuBMAP datasets instead of the single HuBMAP dataset.

thomashopkins32 commented 1 year ago

We can also look into using cross-validation instead but this seems too expensive for such a large model.

thomashopkins32 commented 1 year ago

Not necessary to do. I forgot I return both the transformed and original images in the batch!