hearbenchmark / hear-eval-kit

Evaluation kit for the HEAR Benchmark
https://hearbenchmark.com
Apache License 2.0
56 stars 17 forks source link

Train prediction model end2end #311

Closed faroit closed 3 years ago

faroit commented 3 years ago

We would like to carry some experiments where we are looking for a simple way to train the prediction model end to end without freezing the model and writing the embeddings to disk. Is there are simple way to do this while still being able to load the tasks from a task dir?

python3 -m heareval.predictions.runner MODULE_NAME --model WEIGHTS_FILE --tasks-dir hear-2021.0.3/tasks/
turian commented 3 years ago

@faroit we do not plan to offer this functionality before the deadline, but I can explain to you how to implement it yourself.

You first need to decide how you will interleave the different tasks. For example, one minibatch cycling through different tasks. This is what is more tricky to implement.

To fine-tune through your embedding model, in heareval/prediction/task_predictions.py, instead of passing in the embedding as input x, you should pass in the audio and embed it using your model. Then you can use your model on the tasks.

faroit commented 3 years ago

@turian thanks for you reply. I guess it won't actually make sense to add this to the eval-kit at all...

You first need to decide how you will interleave the different tasks. For example, one minibatch cycling through different tasks. This is what is more tricky to implement.

I'm not sure if I understand this correctly. Why are tasks to be interleaved? I thought (without having checked the code) that the downstream tasks are trained sequentially?

To fine-tune through your embedding model, in heareval/prediction/task_predictions.py, instead of passing in the embedding as input x, you should pass in the audio and embed it using your model. Then you can use your model on the tasks.

Yep, thanks. I will do that. In the end, we don't really need to train an embedding model on all downstream tasks this is just for prototyping some ideas. To be honest, I still think the audio community lacks reproducible, trainable baselines. Is there anything else you would recommend for training a fast baseline on one of the three datasets?

turian commented 3 years ago

@faroit in heareval, each downstream task is trained independently. If you want to do multi-modal training of all downstream tasks, you have your own choice how you train all the tasks simultaneously.

Personally, I think this is a very exciting idea. If you can get a pretrained or untrained audio model, and then finetune it on the open tasks, it would be a very promising approach. When we release the secret tasks, it would be easy for you to also include the secret tasks in your training. And, of course, you could write more tasks for hear-preprocess. We are happy to give advice on how to do this.

faroit commented 3 years ago

Personally, I think this is a very exciting idea. If you can get a pretrained or untrained audio model, and then finetune it on the open tasks, it would be a very promising approach. When we release the secret tasks, it would be easy for you to also include the secret tasks in your training. And, of course, you could write more tasks for hear-preprocess. We are happy to give advice on how to do this.

@turian thanks again. So just to clarify on this: we are allowed to submit embeddings that where trained on the downstream tasks?

turian commented 3 years ago

@faroit Yes! You are allowed. The rules specifically prohibit that you train on any data marked as "test". So you are not allowed to use the test data in any way.

And if you need some help adding more datasets to hear-preprocess, we can give you some pointers.