inference文件无输出

msg-bq commented 2 years ago

比如样例为：

{ "text": "EU rejects German call to boycott British lamb .", "tokens": ["EU", "rejects", "German", "call", "to", "boycott", "British", "lamb", "."], ...... }

仅设置record.schema为

["organization", "miscellaneous"] [] {}

经处理后的句子将为

[' miscellaneous organization EU rejects German call to boycott British lamb .']

最终输出，即为空。或者将上述处理后的句子传入huggingface的api/pipeline时的返回结果也是空。（上面是一个例子方便您阅读。我按照所示步骤处理过数据集，那些样例的返回结果也是如此）

直接使用uie-base-en进行预测的，没有训练。还有一处改动是，tokenizer.json内已经包括了\<spot>和\<asoc>，所以我移除了added_tokens.json文件。

请问是我的处理错误吗？谢谢！

luyaojie commented 2 years ago

您好，

README中的 uie-base-en 是经过预训练的基础模型，需要在下游任务微调后使用。

关于移除 added_tokens.json 的问题，可以测试一下 tokenizer 能否正常对包含特殊符号的文本进行tokenize，如果结果正常则无影响。

msg-bq commented 2 years ago

您好，

README中的 uie-base-en 是经过预训练的基础模型，需要在下游任务微调后使用。

关于移除 added_tokens.json 的问题，可以测试一下 tokenizer 能否正常对包含特殊符号的文本进行tokenize，如果结果正常则无影响。

好的好的，十分感谢！

luyaojie commented 2 years ago

如果有问题，欢迎再次打开Issue，或者给我发邮件 If you have any further questions, please open the Issue again or email me

universal-ie / UIE

inference文件无输出 #17