roger-tseng / av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
https://av.superbbenchmark.org/
Other
49 stars 4 forks source link

train issue #4

Open Kiri0824 opened 1 week ago

Kiri0824 commented 1 week ago

I'm currently training this set of code on a Chinese dataset. The upstream part I'm using is the fusion_feats of avhubert, and the downstream part is av_asr. I've compiled the dictionary based on Chinese. Now I've encountered a problem. When the model was initially initialized, the predicted tokens had values. However, after several training steps, when I was calculating the metrics and outputted the token results of pred, they were all empty.

roger-tseng commented 5 days ago

Could you check the training loss over time to see if the performance is actually improving?

If the training loss isn't improving, it's possible that your model isn't actually fitting the training data very well, since the default configuration for ASR here only trains a small amount of parameters.

To improve the performance, you can try increasing the number of trained parameters by using a larger prediction head, or finetuning the AV-HuBERT encoder with the --upstream_trainable option in run_downstream.py

Kiri0824 commented 5 days ago

Could you check the training loss over time to see if the performance is actually improving?

If the training loss isn't improving, it's possible that your model isn't actually fitting the training data very well, since the default configuration for ASR here only trains a small amount of parameters.

To improve the performance, you can try increasing the number of trained parameters by using a larger prediction head, or finetuning the AV-HuBERT encoder with the --upstream_trainable option in run_downstream.py

here's my result: image loss seems normal. but i print pred token when calculating the metrics. all empty. i think its my configuration problem ?