Percentage training and test dataset

In my experience, the standard can actually range from 50-50 to 95-5. Usually it depends on the amount of data available. In our case there is so little data available that it makes sense to try to have as much in the training data as is reasonable. Admittedly it's not ideal, but to me this is a trade-off to make to try to get as good a model as possible, at the risk of the evaluation being less robust.

Technically, if we wanted to be more robust, we should actually be doing multi-fold cross validation as well.

lindawangg / COVID-Net

Percentage training and test dataset #35