matthew-sochor-zz / fish.io.ai

Krillin' it since 2017
MIT License
4 stars 3 forks source link

Create auto-sorted dataset to improve and accelerate labeling process #70

Closed thenomemac closed 7 years ago

thenomemac commented 7 years ago

re-run best model with adam and 1e-4:1e-6 decay and 1e-3:1e-4 LR and save out all data sorted by cross entropy to rapidly improve labels

thenomemac commented 7 years ago

Via experimentation I foundout that sorting the images in each fish's folder by top 1 probability works way better than crossentropy for quickly finding likely miss bucketed images.

Images sorted with this algo can be found at s3://what-is-my-fish-images/catone

thenomemac commented 7 years ago

Hoping to have all images filtered for a second pass through by tonight 4/23/17

thenomemac commented 7 years ago

Okay, you should use data set "trainfilter" "testfilter" when modeling from here on out for best results. Note, I have to rename all of the images in "trainfilter" s3 folder since all the images have an extra number appended to the front of the file name that'll mess up our data pipeline.

matthew-sochor-zz commented 7 years ago

Great jorb! This dramatically improved the modeling

thenomemac commented 7 years ago

for the record sorting by top K=1 prob worked better than cross entropy

On Wed, Apr 26, 2017 at 9:16 PM, Matthew A Sochor notifications@github.com wrote:

Closed #70 https://github.com/matthew-sochor/fish.io.ai/issues/70.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/matthew-sochor/fish.io.ai/issues/70#event-1059627560, or mute the thread https://github.com/notifications/unsubscribe-auth/AOeuWg5-MIVx3LP7BCiCvYz52MCL0P9tks5rz-xYgaJpZM4NFTCX .