inaturalist / iNaturalistMLWork

0 stars 0 forks source link

sampled training #5

Open alexshepard opened 9 months ago

alexshepard commented 9 months ago

instead of using the same 1,000 photos for each epoch of a training run, for classes where we have more data, we could sample 1,000 photos for each epoch. this should improve accuracy on majority classes in a way that doesn't imbalance the model to underperform on minority classes.

the approach to do this in tensorflow seems to be https://www.tensorflow.org/api_docs/python/tf/data/Dataset#rejection_resample but in my experiments it seems to leak memory. need to do more research.