marl / openl3

OpenL3: Open-source deep audio and image embeddings
MIT License
451 stars 56 forks source link

Example of fine-tuning the audio sub-network. #91

Open mattiacampana opened 2 years ago

mattiacampana commented 2 years ago

I want to perform the fine-tuning of the audio subnetwork to fit my audio classification problem. To this aim, I plan to use the _construct_linear_audio_network, _construct_mel128_audio_network, and _construct_mel256_audio_network functions to load the pre-trained Keras model and then append one or more fully-connected layers to perform the classification.

However, I don't understand the Input shape of such models. According to the models.py, the input shape is input_shape = (1, asr * audio_window_dur), where asr= 48000 and audio_window_dur=1; what's asr and why it has that value? Can you please provide an example of using the Keras model from the .wav file?

I really appreciate any help you can provide.

sreenivasaupadhyaya commented 1 year ago

Hi @mattiacampana Could you please tell me how you got the pre trained keras weights for the audio sub network or any code to read the model and load the pre trained weights? Thanks you.