Closed happylittlecat2333 closed 9 months ago
cc @younesbelkada
Hi @happylittlecat2333, thanks for the very thorough analysis here!
I've opened a PR (#27153) to convert the weights from the new clap checkpoints. I believe that you missed some parameters when you converted the weights!
You can find the converted weights (here, here and here - yet to be moved to laion organization). Would you mind running your benchmark on it again ? Thanks!
Great Job!!! The converted models have the similar results with the new clap checkpoints!
Below is my result after converting the models.
Zeroshot Classification Results: mean_rank: 1.1450 median_rank: 1.0000 R@1: 0.9275 R@5: 0.9975 R@10: 1.0000 mAP@10: 0.9556
Zeroshot Classification Results: mean_rank: 1.1850 median_rank: 1.0000 R@1: 0.9000 R@5: 0.9975 R@10: 1.0000 mAP@10: 0.9400
Zeroshot Classification Results: mean_rank: 1.1850 median_rank: 1.0000 R@1: 0.9175 R@5: 0.9950 R@10: 0.9975 mAP@10: 0.9513
Zeroshot Classification Results: mean_rank: 1.2325 median_rank: 1.0000 R@1: 0.9100 R@5: 0.9900 R@10: 0.9950 mAP@10: 0.9467
Zeroshot Classification Results: mean_rank: 1.1450 median_rank: 1.0000 R@1: 0.9275 R@5: 0.9900 R@10: 1.0000 mAP@10: 0.9568
Zeroshot Classification Results: mean_rank: 1.1100 median_rank: 1.0000 R@1: 0.9350 R@5: 0.9975 R@10: 1.0000 mAP@10: 0.9622
PS: I converted the models using PR (https://github.com/huggingface/transformers/pull/27153), and the converted model work great! But I find preprocessor config and tokenizer config are not saved, including preprocessor_config.json
, special_tokens_map.json
, tokenizer_config.json
, tokenizer.json
and vocab.json
. It will be perfect if the converting code incude the whole saving process!
Thanks for your wonderful work!!
Hey @happylittlecat2333, many thanks for running the benchmark so promptly! Happy to see that it fixed the benchmark ! I will merge the PR asap!
PS: I converted the models using PR (https://github.com/huggingface/transformers/pull/27153), and the converted model work great! But I find preprocessor config and tokenizer config are not saved, including preprocessor_config.json, special_tokens_map.json, tokenizer_config.json, tokenizer.json and vocab.json. It will be perfect if the converting code incude the whole saving process!
I've manually added the processor (feature extractor and tokenizer) to the repos, as it was the same than the previous checkpoints! For now, I'll leave the PR as it is, but I keep that in mind if the issue appears again!
BTW, you can now find the weights (including the processor configs) in the LAION organization on the hub - here, here and here. Feel free to use these checkpoints if you use them again!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hey @happylittlecat2333, many thanks for running the benchmark so promptly! Happy to see that it fixed the benchmark ! I will merge the PR asap!
PS: I converted the models using PR (#27153), and the converted model work great! But I find preprocessor config and tokenizer config are not saved, including preprocessor_config.json, special_tokens_map.json, tokenizer_config.json, tokenizer.json and vocab.json. It will be perfect if the converting code incude the whole saving process!
I've manually added the processor (feature extractor and tokenizer) to the repos, as it was the same than the previous checkpoints! For now, I'll leave the PR as it is, but I keep that in mind if the issue appears again!
BTW, you can now find the weights (including the processor configs) in the LAION organization on the hub - here, here and here. Feel free to use these checkpoints if you use them again!
Hi, can you help convert the Microsoft mclap model (https://huggingface.co/microsoft/msclap/tree/main)? This model has been trained on a huge number of audio-text pairs and actually works better than the original clap, but the architecture of the model is different from the previous one.
System Info
Question Description
I want to use CLAP in huggingface model style but only find "laion/clap-htsat-unfused" and "laion/clap-htsat-fused" in huggingface Models. However, I wish to use the music CLAP model, which are recently updated in https://github.com/LAION-AI/CLAP, such as music_speech_epoch_15_esc_89.25.pt, so I find convert_clap_original_pytorch_to_hf.py to convert the clap model. But I find that the newly update model(like music_speech_audioset_epoch_15_esc_89.98.pt) are based on
HTSAT-base
model, thehidden_size
andpatch_embeds_hidden_size
are different. So I revise the convert_clap_original_pytorch_to_hf.py to below. But after test three model( includingHTSAT-base
andHTSAT-tiny
based model), I find Acc drop forHTSAT-base
model, can you please help me find out the problem, and maybe upload huggingface version of CLAP model for newly updated CLAP model in original repo, and maybe give a new PR to be compatible with bothHTSAT-base
andHTSAT-tiny
?Who can help?
@ArthurZucker @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
My revised
convert_clap_original_pytorch_to_hf.py
convert script:
My evalute on ESC50 adopted by clap eval code in original repo esc50_api.py
Expected behavior
Evaluate Result
630k-audioset-best (before convert, HTSAT-tiny type)
630k-audioset-best (after convert, HTSAT-tiny type)
music_audioset_epoch_15_esc_90.14 (before convert, HTSAT-base type)
music_audioset_epoch_15_esc_90.14 (after convert, HTSAT-base type)
music_speech_audioset_epoch_15_esc_89.98 (before convert, HTSAT-base type)
music_speech_audioset_epoch_15_esc_89.98 (after convert, HTSAT-base type)
Therefore, we can see that
HTSAT-base
type have Acc drop after converting to huggingface type, could you please help us figure out this bug, and maybe upload huggingface version of CLAP model for music_speech_epoch_15_esc_89.25.pt, music_speech_audioset_epoch_15_esc_89.98.pt, and and maybe give a new PR to be compatible with bothHTSAT-base
andHTSAT-tiny
? Thanks!