Closed alexshepard closed 4 months ago
experiment is done, results shared with team.
next steps are for me to write a blog post sharing the results.
in the CV meeting where I shared results of the experiment, I mentioned that we could do further analysis of geo export/model variance by looking at the difference in thresholded maps between the two runs. I didn't get a clear sense of whether this is something that anyone would want to see, and whether it must be done before we consider this alternate training run finished or not. do you think this needs to get done, @pleary or @loarie ?
if we are interested in this for more than just curiosity, what would be the expectation for this variance? how much difference would be acceptable or unacceptable? how many taxa should we evaluate for? I'm happy to do the work if there are criteria for what we'd do with the results.
if it's just a curiosity thing, then the models and thresholds are on the NFS drive for anyone to experiment with.
Blog post draft up at https://www.inaturalist.org/blog/90401-some-thoughts-on-ml-accuracy/
Did one round of edits after feedback from @loarie
blog post published, closing this issue
try to estimate the run-to-run variance on CV models by training a second CV model on a second export at the same time. it should have the same taxa but different random selection of photos.