Closed laxmimerit closed 6 years ago
I have found that you have reported accuracy more than 80% in many cases
I'm not exactly sure which accuracy figures you're referring to exactly, could you point to the exact place? In general ESC-50 papers & EARS are separate things, the default model in this repo is very rudimentary.
It seems you have done, training, testing, and validation on the same dataset that's why it showing 85% accuracy after 100 epochs.
As to the validation setup - it's pretty standard procedure to do it on different folds of the same dataset. Or do you mean doing it on exactly the same data?
One more thing I have noticed that there is a huge jerk in recorded audio after a periodic time. perhaps, this model was not tested with unseen data, i.e. out of training data.
If by "jerk" you mean out-of-sync issues, then yes, that is very probable. Live recording preview is tricky with current setup, and it is more like a debug feature than a production one.
Hi, Thank you for coming in the loop. Initial setup with raspberry pi and on the computer was great. But it was not able to classify audio so I thought to retrain it on ESC-50 data. I had got around 85.6% accuracy after 100 epochs by running train.py file as you suggested. After replacing the new model, It was still not able to classify real audio with some confidence. You have really done a good job but I am not able to classify real-time audio. jerk could be coming becuase of realtime preview, I agree with that. How much accuracy did you get with this model in real world noise? Did you test on real time data?
Thanks for posting your work.
One thing definitely worth checking is the discrepancy between dataset recording conditions and what you get on your device (recording volume/gain) as these can vary a lot. Networks are trained on standardized features, so you have to make sure that AUDIO_MEAN
and AUDIO_STD
in config.py
correspond more or less to your situation.
Okay. I will check it. Do you have suggestions to automate it according to the environment?
Hey,
I am thinking to modify classify method as following.
x_mean = np.mean(X) x_std = np.std(X) X -= x_mean X /= x_std
what do you think? will it work? or do you have any other suggestion?
Proper automation would require a calibration step when being first run on new hardware/settings.
The best way with current codebase would be to manually check what mean/std dev values you're getting for some typical recording conditions ("normal sounds", not too loud, not too quiet) and then change the values in config.py
accordingly.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open a closed issue if needed.
Hi, I have been testing your code and seen research papers too. It's nice to get working on this code. I have found that you have reported accuracy more than 80% in many cases and I have also found while retraining the model but here is a trick. It seems you have done, training, testing, and validation on the same dataset that's why it showing 85% accuracy after 100 epochs. But in the real world, It is not doing any better than hit-n-trail methods. Even a single classification is not right.
It is classifying Fan to Cat, Cat to a mouse click, keyboard to frog and so on. Overall, there is no relation between input and taget classification.
One more thing I have noticed that there is a huge jerk in recorded audio after a periodic time. perhaps, this model was not tested with unseen data, i.e. out of training data.
I am hoping, you would get back to me with a suggestion. Thanks for posting it here, at least now I can start modifying this model itself. Thanks.