sampled training - Githubissues

instead of using the same 1,000 photos for each epoch of a training run, for classes where we have more data, we could sample 1,000 photos for each epoch. this should improve accuracy on majority classes in a way that doesn't imbalance the model to underperform on minority classes.

the approach to do this in tensorflow seems to be https://www.tensorflow.org/api_docs/python/tf/data/Dataset#rejection_resample but in my experiments it seems to leak memory. need to do more research.

inaturalist / iNaturalistMLWork

sampled training #5