Example of fine-tuning the audio sub-network.

I want to perform the fine-tuning of the audio subnetwork to fit my audio classification problem. To this aim, I plan to use the _construct_linear_audio_network, _construct_mel128_audio_network, and _construct_mel256_audio_network functions to load the pre-trained Keras model and then append one or more fully-connected layers to perform the classification.

However, I don't understand the Input shape of such models. According to the models.py, the input shape is input_shape = (1, asr * audio_window_dur), where asr= 48000 and audio_window_dur=1; what's asr and why it has that value? Can you please provide an example of using the Keras model from the .wav file?

I really appreciate any help you can provide.

marl / openl3

Example of fine-tuning the audio sub-network. #91