Closed sainathadapa closed 4 years ago
I'm going to try porting the model to PyTorch. Before I work on that, can you tell me if you already have a PyTorch version of the model right now? This is so that I do not waste effort in doing something that is already done.
Hi! We haven't ported the model to PyTorch; feel free to do so!
Where can the Pytorch models be found? Thanks in advance!
Fortunately, MMdnn (https://github.com/microsoft/MMdnn) worked perfectly for my needs. You can also use that. If you want to see the commands I used to port the model, look here: https://github.com/sainathadapa/urban-sound-tagging/tree/after_challenge/nbs/openl3
@sainathadapa thank you for sharing your codes to convert openl3 to pytorch.
I see you took the mel128 embedding, are the pytorch weights available anywhere please ?
Hi @adrienchaton we have been working on a pytorch port. It was a little more fiddly than we expected but seems to be relatively stable now. We are prepping it for more general release: https://github.com/turian/torchopenl3
Please feel free to email me, lastname@gmail.com. I'm a huge fan of your work, and Philippe is on the committee for a NeuroIPS audio representation competition I am proposing. I'd love to talk more.
The main issue was the difference in Kapre 0.1.3 STFTs is in high frequencies. This means that on the chirp audio, our MAE was maybe 2e-3 versus tfopenl3 when using mels (I'd have to double check). On FSD50K 100 random sounds, it was far lower.
great, thank you for sharing your port and your awesome research too ! right now it is for an art project I would use it, so it is not needed to be perfectly reproduced I am following up by email
@adrienchaton great looking forward to talking. And any issues, questions, snags, etc. file an issue on github
this is awesome, thanks for putting this together!
Heads up - we're working on an update to openl3 that will include:
@turian if you think it makes sense it would be awesome to merge torchopenl3
into openl3
eventually, such that a single library provides support for both TF and PyTorch backends.
@justinsalamon thank you, I wanted to reach out and make sure this is all copacetic before doing any public move. Happy to integrate. TBH getting MAE low with kapre old version was quite gnarly and we had to reimplement a lot of the Mel stuff ourselves. (We still get high error on high frequencies like chirp).
BTW my email inbox is open. lastname@gmail.com
I have talked with Zeyu---who is on the committee for the accepted NeuroIPS 2021 competition I'm organizing, learning general purpose audio representations. If possible I'd love to confirm that your model and weights could potentially be included as pretraining for the dev-kit. Let me know if you'd like to sync over email or chat on Zoom for 30 minutes.
@justinsalamon my one request would be that if numpy librosa is used, we make sure to find a compatible GPU spectrogram / mel implementation. Matching the kapre spectrograms was quite hellacious. I'd want to sanity check torchlibrosa etc
I think this is of interest to people who are synthesizing audio on the GPU.
What we've done is the following:
As you might expect, the embeddings don't match perfectly when we replace the audio front-end. However, performance on the downstream classification task was the same (or within the margin of error), which we hope is good enough.
So, the updated version of OpenL3 will let you choose between the Kapre and Librosa front-end, but they are not interchangeable. Models trained on embeddings from a specific front-end should continue to use the same front-end for inference. The same would apply if we incorporated a pytorch version - it would be close but probably not interchangeable with the TF versions.
Yes, I owe you an email. Coming soon.
p.s - @turian happy to find a time for a quick chat if that would be helpful.
Any chance you have PyTorch model files saved, in addition to the Keras models present in this repo?