Closed Yumeka999 closed 2 years ago
无法复现。建议使用HanLP提供的语料库。先把格式弄清楚了再用自己的数据。
无法复现。建议使用HanLP提供的语料库。先把格式弄清楚了再用自己的数据。 何老师,这数据是HanLP提供的语料库,只不过CTB8_CWS_TRAIN原本是URL,我将URL对应的文件下载到本地解压
/home/_tmp/data/ctb8_cn/tasks/cws/train.txt /home/_tmp/data/ctb8_cn/tasks/pos/train.txt /home/_tmp/data/msra_ner_token_level_cn/word_level.train.tsv 这几个数据集文件是从 from hanlp.datasets.ner.msra import MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_TRAIN from hanlp.datasets.parsing.ctb8 import CTB8_POS_TRAIN 中MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_TRAIN和CTB8_POS_TRAIN对应的URL https://wakespace.lib.wfu.edu/bitstream/handle/10339/39379/LDC2013T21.tgz#data/tasks/cws/train.txt http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.train.short.tsv 里下载并解压后得到的数据,是hanlp_demo里的数据 现在使用github的hanlp_demo里的数据运行出现上述问题
无法复现,不要手动下载HanLP的数据,因为你根本不知道怎么预处理这些语料。HanLP会自动帮你下载+预处理。你要是配不好环境的话就用colab:https://colab.research.google.com/drive/1qE2OkSTluMWZjfrj01iWrDOO8HisAmBk?usp=sharing
好,多谢
Describe the bug mtl.evaluate报出 AssertionError: No samples loaded
Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Describe the current behavior 运行报错 File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/common/dataset.py", line 129, in init assert data, 'No samples loaded' AssertionError: No samples loaded
Expected behavior 程序正常输出样本集上P R F
System information
Other info / logs /home/_tmp/data/ctb8_cn/tasks/cws/train.txt /home/_tmp/data/ctb8_cn/tasks/cws/dev.txt /home/_tmp/data/ctb8_cn/tasks/cws/test.txt
/home/_tmp/data/ctb8_cn/tasks/pos/train.txt /home/_tmp/data/ctb8_cn/tasks/pos/dev.txt /home/_tmp/data/ctb8_cn/tasks/pos/test.txt
/home/_tmp/data/msra_ner_token_level_cn/word_level.train.tsv /home/_tmp/data/msra_ner_token_level_cn/word_level.dev.tsv /home/_tmp/data/msra_ner_token_level_cn/word_level.test.tsv { "tok": [ "华纳", "音乐", "旗下", "的", "新垣结衣", "在", "12月", "21日", "于", "日本", "武道馆", "举办", "歌手", "出道", "活动" ], "ner": [ ["华纳音乐", "ORGANIZATION", 0, 2], ["新垣结衣", "PERSON", 4, 5], ["12月", "DATE", 6, 7], ["21日", "DATE", 7, 8], ["日本", "LOCATION", 9, 10], ["武道馆", "LOCATION", 10, 11] ] } 1 / 2 Building tst dataset for ner ... Traceback (most recent call last): File "hanlp_traincn.py", line 134, in
metric, * = mtl.evaluate(S_SAV_MODEL_DIR)
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/components/mtl/multi_task_learning.py", line 753, in evaluate
rets = super().evaluate('tst', save_dir, logger, batch_size, output, kwargs)
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/common/torch_component.py", line 469, in evaluate
device=self.devices[0], logger=logger, overwrite=True))
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/components/mtl/multi_task_learning.py", line 156, in build_dataloader
cache=isinstance(data, str), config)
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/components/mtl/tasks/ner/tag_ner.py", line 123, in build_dataloader
dataset = self.build_dataset(data, cache=cache, transform=transform, args)
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/components/ner/transformer_ner.py", line 216, in build_dataset
dataset = super().build_dataset(data, transform, kwargs)
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/components/taggers/transformers/transformer_tagger.py", line 170, in build_dataset
return TSVTaggingDataset(data, transform=transform, **kwargs)
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/datasets/ner/tsv.py", line 45, in init
super().init(data, transform, cache, generate_idx)
File "/home/xy/miniconda3/envs/py364_xy/lib/python3.6/site-packages/hanlp/common/dataset.py", line 129, in init
assert data, 'No samples loaded'
AssertionError: No samples loaded