Closed bagustris closed 2 years ago
Hey @bagustris,
Could you instead use the following:
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/unispeech-sat-base-plus")
and use the feature_extractor
as the class to process the audio?
As you said the model doesn't have a tokenizer so we can simply use the feature extractor here
Hey @bagustris,
Could you instead use the following:
from transformers import AutoFeatureExtractor feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/unispeech-sat-base-plus")
and use the
feature_extractor
as the class to process the audio?As you said the model doesn't have a tokenizer so we can simply use the feature extractor here
Hi @patrickvonplaten,
Thank you for the solution! it works! I think it should be clearly explained in the Transformer documentation. E.g., https://huggingface.co/transformers/model_doc/unispeech_sat.html
I tried several audio embeddings including unispeechSat model above and facing the same errors. Your answer is what I looking for.
Great! In case you want to use for model for audio-classification, the example doc of this section: https://huggingface.co/transformers/model_doc/unispeech_sat.html#unispeechsatforsequenceclassification should be helpful :-) Could you take a look and let me know if it's useful? :-)
Yes, that is useful and throws no error.
But the point is to use transformers as feature extractor (i.e., extract audio embedding like hubert, unispeech,wav2vec, etc) not as predictors (aka model). The trend is that the larger audio embedding (hubert-large, wav2vec2-large, unispeechSat-large) tends to obtain better performance, based my own experience. And some of those models result in error as when I used it as processor
due to lack of tokenizer. I think I should use Wav2Vec2FeatureExtractor
instead of Wav2Vec2Processor
. But your solution works!
Again, thanks for the great job!
I tried to use pretrained model from huggingface, it seems no tokenizer uploaded there.
(1) Any workaround? (2) Also, since I don't need tokenizer (used for audio classification), is there any option to disable obtaining tokenizer?
cc @patrickvonplaten