请问VLEProcessor.from_pretrained可以将切分好的token映射为对应的ID。那么如何将对应的ID转化为文本呢？

iflytek / VLE

VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)

Apache License 2.0

176 stars 12 forks source link

请问VLEProcessor.from_pretrained可以将切分好的token映射为对应的ID。那么如何将对应的ID转化为文本呢？ #3

Closed the-nine-nation closed 1 year ago

the-nine-nation commented 1 year ago

我在后面接了一个decode，然后接一个全连接层预测输出文字，但我不明白如何将它转化为对应的文字输出

GoGoJoestar commented 1 year ago

您可以使用VLEProcessor.tokenizer，使用方法和transformers的其他tokenizer一样，比如

vle_processor = VLEProcessor.from_pretrained(model_name)
print(vle_processor.tokenizer.encode('A nice day!'))
# [1, 336, 1085, 406, 300, 2]
print(vle_processor.tokenizer.decode([1, 336, 1085, 406, 300, 2]))
# '[CLS] A nice day![SEP]'

the-nine-nation commented 1 year ago

非常感谢