karolpiczak / ESC-50

ESC-50: Dataset for Environmental Sound Classification
Other
1.36k stars 286 forks source link

New best result with fast.ai #9

Closed johanndiedrick closed 5 years ago

johanndiedrick commented 5 years ago

Table in README.md shows results from training. Feel free to verify locally as well.

karolpiczak commented 5 years ago

Thanks for the PR.

I had a quick look at the notebook code. It seems instead of using the predefined fold split, it's randomly picking 20% of the dataset as validation. This results in a completely different validation methodology.

The proper procedure would be to have a full 5-fold cross-validation with 4 folds as training and 1 alternating validation dataset based on the splits provided in the CSV.

It's important, because only in this way we are not mixing segments from a single source recording between training and validation.

johanndiedrick commented 5 years ago

Thanks for the comment!

So to make sure I understand correctly, I should train my model using the 5 major categories (folds). 4 of the folds will be the training set ( 1. Animals, 2. Natural soundscapes & water sounds, 3. Human, non-speech sounds, and 4. Interior/domestic sounds for example.) 1 of the folds will be the validation set (Exterior/urban noises for example)

I'm looking at the fast.ai documentation on how to do this:

https://docs.fast.ai/vision.data.html#ImageDataBunch.from_folder

So something like:

data = ImageDataBunch.from_folder(path, train="train", valid="valid", ds_tfms=get_transforms(), size=360, num_workers=4).normalize(imagenet_stats)

And my folder structure should look something like:

path\

Does that make sense? Let me know! Been having fun exploring this dataset, so thank you :)

karolpiczak commented 5 years ago

No, major categories and folds are not related. In the CSV file there's a specific "fold" column. You can also find it as the first segment of the filename:

{FOLD}-{CLIP_ID}-{TAKE}-{TARGET}.wav

{FOLD} - index of the cross-validation fold, {CLIP_ID} - ID of the original Freesound clip, {TAKE} - letter disambiguating between different fragments from the same Freesound clip, {TARGET} - class in numeric format [0, 49].

I'm not sure what's the best way to do it in fast.ai without data duplication, but you want all WAV files beginning with 1-, 2-, 3-, 4- for training and 5- for validation. Then 1- for validation and the rest for training and so on.

On Thu, Mar 7, 2019, 19:36 Johann Diedrick notifications@github.com wrote:

Thanks for the comment!

So to make sure I understand correctly, I should train my model using the 5 major categories (folds). 4 of the folds will be the training set ( 1. Animals, 2. Natural soundscapes & water sounds, 3. Human, non-speech sounds, and 4. Interior/domestic sounds for example.) 1 of the folds will be the validation set (Exterior/urban noises for example)

I'm looking at the fast.ai documentation on how to do this:

https://docs.fast.ai/vision.data.html#ImageDataBunch.from_folder

So something like:

data = ImageDataBunch.from_folder(path, train="train", valid="valid", ds_tfms=get_transforms(), size=360, num_workers=4).normalize(imagenet_stats)

And my folder structure should look something like:

path train animals 1.wav 2.wav 3.wav ... natural 1.wav 2.wav 3.wav ... human 1.wav 2.wav 3.wav ... interior 1.wav 2.wav 3.wav ... valid exterior 1.wav 2.wav 3.wav ...

Does that make sense? Let me know! Been having fun exploring this dataset, so thank you :)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/karoldvl/ESC-50/pull/9#issuecomment-470643172, or mute the thread https://github.com/notifications/unsubscribe-auth/AANKQrsjGFkgTe-yw6ThQIrsXAhxfAygks5vUVw6gaJpZM4biUD8 .

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open a closed issue if needed.