Input type/dim of the model

aslanismailgit commented 3 years ago

Hi Jack, This is a very nice work. I am trying to adopt my case. But my question is, the input for the trained YAMNET model (and weights) is a waveform with a dimension of (48000) as I put below (according to https://tfhub.dev/google/yamnet/1). But your model gets a [94,224] input shape.

The pre-trained weights should be according to waveform (I think) But here you are using for a different input shape. What am I missing? thanks. ia


import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import csv
import io

# Load the model.
model = hub.load('https://tfhub.dev/google/yamnet/1')

# Input: 3 seconds of silence as mono 16 kHz waveform samples.
waveform = np.zeros(3 * 16000, dtype=np.float32)

# Run the model, check the output.
scores, embeddings, log_mel_spectrogram = model(waveform)

jackgle commented 3 years ago

Hello, thanks. The model contains a layer that converts an input waveform to spectrograms (no trainable weights involved). In my code I pre-compute the spectrograms and pass them directly to the model. This way training is much faster because spectrograms are not computed on the fly for every sample. The input shape of spectrograms should be (96, 64)

aslanismailgit commented 3 years ago

thanks !

jackgle / YAMNet-transfer-learning

Input type/dim of the model #1