The first line prints a MCTCTProcessor instance, containing aMCTCTFeatureExtractor feature extractor and Wav2Vec2CTCTokenizer tokenizer) while the second prints just an Wav2Vec2CTCTokenizer instance.
Expected behavior
AutoProcessor.from_pretrained should return an MCTCTProcessor instance when the provided model is an MCTCT model.
The reason it does not right now is because the code for AutoProcessor does not include a mapping entry for MCTCT.
An MCTCTProcessor class exists whose from_pretrained function behaves appropriately. AutoProcessor should behave the same way, rather than falling back to a tokenizer.
The fix seems simple enough, by adding the entry below to PROCESSOR_MAPPING_NAMES (but I am far from an expert):
("mctct", "MCTCTProcessor"),
For comparison, the AutoModel.from_pretrained method does support MCTCT and thus behaves appropriately because its mapping contains a line for MCTCT.
System Info
Not actually relevant, but included for completeness:
transformers
version: 4.29.1Who can help?
@sanchit-gandhi
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The first line prints a
MCTCTProcessor
instance, containing aMCTCTFeatureExtractor
feature extractor andWav2Vec2CTCTokenizer
tokenizer) while the second prints just anWav2Vec2CTCTokenizer
instance.Expected behavior
AutoProcessor.from_pretrained
should return anMCTCTProcessor
instance when the provided model is an MCTCT model.The reason it does not right now is because the code for
AutoProcessor
does not include a mapping entry for MCTCT.An MCTCTProcessor class exists whose
from_pretrained
function behaves appropriately.AutoProcessor
should behave the same way, rather than falling back to a tokenizer.The fix seems simple enough, by adding the entry below to
PROCESSOR_MAPPING_NAMES
(but I am far from an expert):For comparison, the
AutoModel.from_pretrained
method does support MCTCT and thus behaves appropriately because its mapping contains a line for MCTCT.