tensorflow / models

Models and examples built with TensorFlow
Other
77.16k stars 45.77k forks source link

Fine tuning Yamnet model #8425

Closed Alaa-1 closed 4 years ago

Alaa-1 commented 4 years ago

I'm trying to fine tune Yamnet model for another audio classification problem that has 4 classes, but I keep getting this error :

Can not squeeze dim[0], expected a dimension of 1, got 32 [[node model_3/tf_op_layer_Squeeze/Squeeze (defined at :2) ]] [Op:__inference_train_function_51754]

My input shapes are X = (432,80000) and y = (432,)

I've tried both integers and one hot encoding for the labels and still got the same error @plakal @dpwe

plakal commented 4 years ago

Can you share the code you're running and how you're running it?

The released code has only been tested in inference mode and I suspect that the code we use for generating a bunch of framed examples from a clip will need to be a little different if you want to train the model.

Alaa-1 commented 4 years ago

Can you share the code you're running and how you're running it?

The released code has only been tested in inference mode and I suspect that the code we use for generating a bunch of framed examples from a clip will need to be a little different if you want to train the model.

https://colab.research.google.com/drive/1PT5Qiu8buPMNQM6jd32DUpsCU5DbFj_9

I also modified the params.py file. I changed : NUM_CLASSES = 4 CLASSIFIER_ACTIVATION = 'softmax'

Alaa-1 commented 4 years ago

@plakal I don't know if you've noticed but I accidentally closed and reopened my issue. I'm still waiting for your feedback though :+1:

plakal commented 4 years ago

Thanks for sharing the code.

There are several issues:

So you'll need to do a bunch of work to make YAMNet trainable in Keras. We're no Keras experts so I can make some suggestions that you can take as a rough guide:

I'm sorry that I can't provide much more support than this right now. We might make a fine-tuneable version of YAMNet at some point but we have no plans for that at the moment so we currently only officially support YAMNet for inference.

Alaa-1 commented 4 years ago

Thank you for you time and for the detailed explanation.

What about VGGish do you think I can fine tuned for my use case ?

plakal commented 4 years ago

VGGish can be fine-tuned and we provide a small demo of training as well. There are still a a few issues though:

I'm going to close this issue for now since this is about as much help as we can provide right now, but we are now aware that there is some demand for models that are easier to use at the clip level (so you don't have to deal with the example framing) and which can be fine-tuned, so we'll keep that in mind for future model releases and updates (but nothing is planned yet).

falibabaei commented 2 years ago

Thank you very much for your great answer to this question. With your help, I was able to fine- tune the yamnet for my dataset. But I have one more question. If I understand it correctly, each audio is divided into frames with a length of patch_window_seconds and a hop length of patch_window_seconds. The input of the model is a batch of these frames. What if there is a frame of silence in each audio and we label that as our object of interest. Is not that problematic? Of course, we can change the patch_window_seconds and patch_hop_seconds parameters in the parameter file, but how can we be sure that each frame ends up containing the audio of the object of interest? Maybe I have misunderstood the model

SaminYaser-work commented 2 years ago

@falibabaei can you share your code please? I am also trying to fine-tune this model.

falibabaei commented 2 years ago

@SaminYaser-work give me your email. I will send you

SaminYaser-work commented 2 years ago

@falibabaei saminyaserwork@gmail.com tysm 😊😊

falibabaei commented 2 years ago

Done

xime-vazquez commented 11 months ago

@falibabaei can you also share your code with me please? im trying to fine tune the model to recognize 4 different speakers but im having a lot of trouble. ximevzquez@gmail.com

falibabaei commented 11 months ago

@falibabaei can you also share your code with me please? im trying to fine tune the model to recognize 4 different speakers but im having a lot of trouble. ximevzquez@gmail.com

Done

akshit7603 commented 10 months ago

@falibabaei can you share your code with me pls?i am having trouble fine tuning. akshita7603@gmail.com

falibabaei commented 10 months ago

@akshit7603 You can find it here https://github.com/falibabaei/yamnet_finetun

loveprolife commented 8 months ago

@falibabaei hi, could you send me a data sample for yamnet_finetun please. qq815117718@gmail.com