marl / openl3

OpenL3: Open-source deep audio and image embeddings
MIT License
453 stars 57 forks source link

PyTorch models #35

Closed sainathadapa closed 4 years ago

sainathadapa commented 5 years ago

Any chance you have PyTorch model files saved, in addition to the Keras models present in this repo?

sainathadapa commented 4 years ago

I'm going to try porting the model to PyTorch. Before I work on that, can you tell me if you already have a PyTorch version of the model right now? This is so that I do not waste effort in doing something that is already done.

auroracramer commented 4 years ago

Hi! We haven't ported the model to PyTorch; feel free to do so!

janaal1 commented 4 years ago

Where can the Pytorch models be found? Thanks in advance!

sainathadapa commented 4 years ago

Fortunately, MMdnn (https://github.com/microsoft/MMdnn) worked perfectly for my needs. You can also use that. If you want to see the commands I used to port the model, look here: https://github.com/sainathadapa/urban-sound-tagging/tree/after_challenge/nbs/openl3

adrienchaton commented 3 years ago

@sainathadapa thank you for sharing your codes to convert openl3 to pytorch.

I see you took the mel128 embedding, are the pytorch weights available anywhere please ?

turian commented 3 years ago

Hi @adrienchaton we have been working on a pytorch port. It was a little more fiddly than we expected but seems to be relatively stable now. We are prepping it for more general release: https://github.com/turian/torchopenl3

Please feel free to email me, lastname@gmail.com. I'm a huge fan of your work, and Philippe is on the committee for a NeuroIPS audio representation competition I am proposing. I'd love to talk more.

turian commented 3 years ago

The main issue was the difference in Kapre 0.1.3 STFTs is in high frequencies. This means that on the chirp audio, our MAE was maybe 2e-3 versus tfopenl3 when using mels (I'd have to double check). On FSD50K 100 random sounds, it was far lower.

adrienchaton commented 3 years ago

great, thank you for sharing your port and your awesome research too ! right now it is for an art project I would use it, so it is not needed to be perfectly reproduced I am following up by email

turian commented 3 years ago

@adrienchaton great looking forward to talking. And any issues, questions, snags, etc. file an issue on github

justinsalamon commented 3 years ago

this is awesome, thanks for putting this together!

Heads up - we're working on an update to openl3 that will include:

@turian if you think it makes sense it would be awesome to merge torchopenl3 into openl3 eventually, such that a single library provides support for both TF and PyTorch backends.

turian commented 3 years ago

@justinsalamon thank you, I wanted to reach out and make sure this is all copacetic before doing any public move. Happy to integrate. TBH getting MAE low with kapre old version was quite gnarly and we had to reimplement a lot of the Mel stuff ourselves. (We still get high error on high frequencies like chirp).

BTW my email inbox is open. lastname@gmail.com

I have talked with Zeyu---who is on the committee for the accepted NeuroIPS 2021 competition I'm organizing, learning general purpose audio representations. If possible I'd love to confirm that your model and weights could potentially be included as pretraining for the dev-kit. Let me know if you'd like to sync over email or chat on Zoom for 30 minutes.

turian commented 3 years ago

@justinsalamon my one request would be that if numpy librosa is used, we make sure to find a compatible GPU spectrogram / mel implementation. Matching the kapre spectrograms was quite hellacious. I'd want to sanity check torchlibrosa etc

I think this is of interest to people who are synthesizing audio on the GPU.

justinsalamon commented 3 years ago

What we've done is the following:

As you might expect, the embeddings don't match perfectly when we replace the audio front-end. However, performance on the downstream classification task was the same (or within the margin of error), which we hope is good enough.

So, the updated version of OpenL3 will let you choose between the Kapre and Librosa front-end, but they are not interchangeable. Models trained on embeddings from a specific front-end should continue to use the same front-end for inference. The same would apply if we incorporated a pytorch version - it would be close but probably not interchangeable with the TF versions.

Yes, I owe you an email. Coming soon.

justinsalamon commented 3 years ago

p.s - @turian happy to find a time for a quick chat if that would be helpful.