Open saymyname77 opened 2 years ago
@renmada
在generate函数里这行feature = tokenizer.encode(text, return_token_type_ids=True, return_tensors='pt', max_length=512)加了padding=True或truncation=True或都加也没用。
不能直接把文本传给模型,要转换成token id
不能直接把文本传给模型,要转换成token id
请问转换是哪个方法,
文件尾部添加: if name == 'main': generate( '谷歌旗下的YouTube表示,自去年以来,已有13万个视频从其平台上删除,当时它禁止传播有关Covid疫苗的错误信息的内容。在一篇博客文章中,该公司表示,它已经看到有关Covid疫苗的虚假声明“蔓延到有关疫苗的错误信息中”。“我们正在扩大我们在YouTube上的医疗错误信息政策,对当前管理的疫苗进行新的指导方针,这些疫苗已被当地卫生当局和世界卫生组织批准并确认为安全有效,”该帖子说,指的是世界卫生组织。',max_length=40) 报错信息: Calling T5PegasusTokenizer.from_pretrained() with the path to a single file or url is deprecated Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained. Truncation was not explicitly activated but
max_length
is provided a specific value, please usetruncation=True
to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy totruncation
. Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 0.391 seconds. Prefix dict has been built successfully. Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 771, in convert_to_tensors tensor = as_tensor(value) RuntimeError: Could not infer dtype of NoneTypeDuring handling of the above exception, another exception occurred:
Traceback (most recent call last): File "run_csl.py", line 251, in
generate(
File "run_csl.py", line 171, in generate
feature = tokenizer.encode(text, return_token_type_ids=True, return_tensors='pt',
File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2104, in encode
encoded_inputs = self.encode_plus(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2420, in encode_plus
return self._encode_plus(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 444, in _encode_plus
return self.prepare_for_model(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2881, in prepare_for_model
batch_outputs = BatchEncoding(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 276, in init
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 787, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.