Closed Hguimaraes closed 2 years ago
Hello Heitor,
Thank you for your attention. We released the lighthubert checkpoints in https://huggingface.co/mechanicalsea/lighthubert, we can provide you with the configurations that are used to reproduce the lighthubert SUPERB's downstream models.
We followed the default config.yaml
(e.g., doc) as SUPERB officially provided, and we list the batch size (bsz) and learning rate (lr) as follows.
The s3prl added lighthubert as
If you consider different architectures in three lighthubert checkpoints, here can be helpful:
If you have any questions, don't hesitate to ask us.
Best wishes, Rui
Hi,
Thank you very much for your answer! Before opening this issue, I tried to reproduce the KS downstream model. I'm using the same batch size but tried with different learning rates:
But all of them are far from the expected value of 0.9607 from the leaderboard. Do you have the original CKPT file?
I'm training the IC downstream task and I will try to reproduce the results and let you know if did achieve the values from SUPERB.
Best,
Hello,
For the IC with the passed parameters, I was able to get 96.94 (SUPERB Leaderboard says 98.23). Do you know what may be causing this difference? I looked into s3prl code and everything seems to use a seed, the result should be deterministic.
Best, Heitor
Hi Heitor,
The performance degradation is mainly due to the lack of waveform normalization. LightHuBERT models are all trained with normalized waveform inputs, but the interface provided by SUPERB directly feeds the inputs to the pre-trained model. To temporarily fix this issue, you can add a line before https://github.com/s3prl/s3prl/blob/master/s3prl/upstream/lighthubert/expert.py#L49 as
wavs = [F.layer_norm(wav, wav.shape) for wav in wavs]
padded_wav = pad_sequence(wavs, batch_first=True)
Besides, we recommend you configure the subnet before https://github.com/s3prl/s3prl/blob/master/s3prl/upstream/lighthubert/expert.py#L17 like
subnet = self.model.supernet.subnet
self.model.set_sample_config(subnet)
self.model.load_state_dict(checkpoint["model"], strict=False)
so that the subnet is correctly set; otherwise, a larger subnet would be chosen.
There is one more thing to notice. The SUPERB interface picks the first hidden state at a position different from our experiments (before or after the positional convolution), but this probably won't make much difference to the performance.
Thanks for your attention! We will upload the CKPT files soon. Please let us know if you have any other questions.
Best, \ Qibing
Thank you very much for your support and your work, Rui and Qibing! I will use those new configurations and try to close the performance gap.
If you guys want, we can close the issue.
Best, Heitor
Thanks for your attention to our lighthubert. If you have any questions about lighthubert, don't hesitate to ask us. Have a nice day.
Rui
Hi,
Just to let you know, with those changes it is possible to get closer results.
I'm closing the issue! Thank you very much for your help.
Hello Mr. Wang!
First of all, I would like to thank you for your work and effort to make it open source. I've been working on the robustness of SRL models and I'm trying to reproduce the downstream models from SUPERB.
Do you have the CKPT files generated when training the SUPERB models? If not, could you inform the parameters used in the config.yaml file from the tasks? With this, I could reproduce the numbers in the table.
Best regards, Heitor