lindawangg / COVID-Net

COVID-Net Open Source Initiative
Other
1.15k stars 480 forks source link

Percentage training and test dataset #35

Closed ezequielsobrino closed 4 years ago

ezequielsobrino commented 4 years ago

Hi, Why do they use 99% for training and 1% for testing? The standard is 80-20 70-30.

josephius commented 4 years ago

In my experience, the standard can actually range from 50-50 to 95-5. Usually it depends on the amount of data available. In our case there is so little data available that it makes sense to try to have as much in the training data as is reasonable. Admittedly it's not ideal, but to me this is a trade-off to make to try to get as good a model as possible, at the risk of the evaluation being less robust.

Technically, if we wanted to be more robust, we should actually be doing multi-fold cross validation as well.