ImportError: cannot import name 'container_abcs' 请问这个是什么问题？怎么解决呢

wzzzd / lm_ner

基于Pytorch的命名实体识别框架，支持LSTM+CRF、Bert+CRF、RoBerta+CRF等框架

75 stars 18 forks source link

ImportError: cannot import name 'container_abcs' 请问这个是什么问题？怎么解决呢 #16

Open khazic opened 1 year ago

khazic commented 1 year ago

ImportError: cannot import name 'container_abcs'

khazic commented 1 year ago

博主我想问一下我要用英文的数据集我是不是先把language改成‘en’ 然后那个模型改成‘bert-base-uncased’就可以了呀？

wzzzd commented 1 year ago

是的

khazic commented 1 year ago

哥我想问一下我训练好模型之后是不是把我都要用来测试的数据替换test.txt 也是按照一行一个单词那样排序好就能出用模型跑出来的预测结果了吗？

khazic commented 1 year ago

大佬可以加一个你的联系方式吗

wzzzd commented 1 year ago

理论上，是这样的。只要格式跟test.txt的一致，基本也没啥问题。

khazic commented 1 year ago

我的test.txt文件每一行是单词+bio 然后我现在需要预测的文本的每一行就是一个单词这样是可以做模型预测的吗？谢谢答疑🙏🏿

khazic commented 1 year ago

哥不行啊报这个错 ValueError: num_samples should be a positive integer value, but got num_samples=0

wzzzd commented 1 year ago

试试将 Predictor.py 的15行，改成self.metric = False

khazic commented 1 year ago

还是不行哥我的测试集是每一行一个单词他的类别全是O 替换了test.txt文本还是报的这个错误

wzzzd commented 1 year ago

把完整错误信息贴上来看看

khazic commented 1 year ago

(khazic) [root@localhost lm_ner-main]# python run.py read data... Traceback (most recent call last): File "run.py", line 45, in test_loader = dm.get_data_test() File "/home/Khazic/lm_ner-main/DataManager.py", line 279, in get_data_test sampler = RandomSampler(data) if not torch.cuda.device_count() > 1 else DistributedSampler(data) File "/home/anaconda3/envs/khazic/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 103, in init "value, but got num_samples={}".format(self.num_samples)) ValueError: num_samples should be a positive integer value, but got num_samples=0

我是这样操作的我训练了一个epoch 然后我保存了参数我把类别改成了test 接着我运行run.py 是可以运行的我也看到了output。txt 我把output删了以后我把test.txt的label 全部改成了‘O’ 保存后运行run.py 他就报这个错了

khazic commented 1 year ago

哥我知道是什么原因了我如果只测试一句话的话他就会这样多放几句就可以了但是还有一些问题：

src predict label department of clinical [UNK] school of medical sciences university of science and technology [UNK] ghana . [('INS', 'of medical sciences university of', [5, 11])] [] department of medicine ( [UNK] ) university of california los angeles [UNK] 1760 usa . [UNK] ucla . [UNK] [('INS', 'of', [7, 9]), ('COU', 's', [14])] [] department of cognitive and neural systems boston university ma [UNK] usa . [('COU', '.', [11, 63])] []

这是输出的output 第二句他为什么会把[14]单独的一个s作为country呢应该是usa啊这不是一个整体吗还有就是下面那个[11,63] 为什么会出现到63呀我设置的max长度是64 我感觉是他把整个句子从11到结束全部识别成了COU 是这样的吗？

wzzzd commented 1 year ago

模型没fine-tuning完啊，明显的欠拟合。多准备写训练数据，多训练几个epoch，再测试

wzzzd commented 1 year ago

(khazic) [root@localhost lm_ner-main]# python run.py read data... Traceback (most recent call last): File "run.py", line 45, in test_loader = dm.get_data_test() File "/home/Khazic/lm_ner-main/DataManager.py", line 279, in get_data_test sampler = RandomSampler(data) if not torch.cuda.device_count() > 1 else DistributedSampler(data) File "/home/anaconda3/envs/khazic/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 103, in init "value, but got num_samples={}".format(self.num_samples)) ValueError: num_samples should be a positive integer value, but got num_samples=0

我是这样操作的我训练了一个epoch 然后我保存了参数我把类别改成了test 接着我运行run.py 是可以运行的我也看到了output。txt 我把output删了以后我把test.txt的label 全部改成了‘O’ 保存后运行run.py 他就报这个错了

看着像是你输入的数据，在经过Sampler前，被处理成了空字符。可以把输入模型前的数据debug看看，或者打印出来看看

khazic commented 1 year ago

哥还有个问题想请教一下就是我output出来的文本带【unk】这个是vocabe上没有的词那我怎么才能把unk替换成text原本的词呢比如我输入的文本是hamburger 但是vocabe里面没有这个词他就自动识别成了unkmow 然后output里面出来的也是unk 可以换成原始文本hamburger吗