Open taalua opened 3 days ago
Hello,
Many thanks for the interest in our work!
Please be aware that the available k-means model (https://huggingface.co/utter-project/mHuBERT-147/blob/main/mhubert147_faiss.index) was trained on the 2nd iteration model (available here: https://huggingface.co/utter-project/mHuBERT-147-base-2nd-iter). It was NOT trained on the checkpoint_best.pt from the 3rd iteration.
If you want to generate discrete labels using features from the 3rd iteration model (https://huggingface.co/utter-project/mHuBERT-147/blob/main/checkpoint_best.pt) you will need to train a new k-means model.
The procedure for doing so is the following:
Thank you for your prompt response. Just to clarify, If I want to get discrete label from 2nd iteration, I can use this script https://github.com/utter-project/mHuBERT-147-scripts/blob/main/03_faiss_indices/apply_index_per_file.py
Thank you.
Yes. You just need to extract the features first!
Hi, Thank you for your excellent work.
I want to extract labels from the features extracted from mHubert-147 checkpoint_best.pt, using the existing k-means model. I tried to follow the script in https://github.com/utter-project/mHuBERT-147-scripts/blob/main/03_faiss_indices/apply_index_per_file.py
However, I am not sure of the *.len file. Can you explain how to get this file?
Thank you.