ivanhe123 / RpBERT-GCN-NER

Apache License 2.0
0 stars 1 forks source link

维度不匹配 #1

Closed liusenling closed 6 months ago

liusenling commented 6 months ago

RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1

ivanhe123 commented 6 months ago

Please send the whole error, the output before the runtime error.

ivanhe123 commented 6 months ago

Also, this model will not work with Chinese Datasets. Is your dataset a Chinese dataset? If so, there need to be minor changes.

liusenling commented 6 months ago

Yes,my Datasets is Chinese. I have modified bert's pre-training model and the English model in SpaCy library. Can you tell me what other changes need to be made?

ivanhe123 commented 6 months ago

Can you please send me your chinese dataset? The test dataset is ok. I will check if the format is correct.

ivanhe123 commented 6 months ago

Also copy the whole error that was outputed into the console and paste it here. I can't see the image.

liusenling commented 6 months ago

test.txt

ivanhe123 commented 6 months ago

the format of the dataset should be: WORD/tEntity. Yours is: WORD Entity

Also, for spacy parser, it parses not word by word, but a group of chinese characters, for an example, if it parses this sentence: 作为语言而言,为世界使用人数最多的语言,目前世界有五分之一人口做为母语。 It parses as: 作为 ADP case 语言 NOUN nmod:prep 而言 PART case , PUNCT punct 为 ADP case 世界 NOUN compound:nn 使用 NOUN compound:nn 人数 NOUN nsubj 最多 VERB amod 的 PART mark 语言 NOUN nmod:prep , PUNCT punct 目前 NOUN nmod:tmod 世界 NOUN dep 有 VERB ROOT 五分之一 NUM dep 人口 NOUN dobj 做为 VERB conj 母语 NOUN dobj 。 PUNCT punct

So there is something wrong with the dataset overall.

liusenling commented 6 months ago

E:\Anaconda\envs\pytorch\python.exe E:/多模态医疗实体识别/RpBERT-GCN-NER-main/train.py 5%|▌ | 127/2500 [07:42<2:24:03, 3.64s/it] Traceback (most recent call last): File "E:\多模态医疗实体识别\RpBERT-GCN-NER-main\train.py", line 63, in ner_loss = train(ner_trainloader, model, optimizer, task='ner') File "E:\多模态医疗实体识别\RpBERT-GCN-NER-main\utils.py", line 32, in train loss, = getattr(model, f'{task}_forward')(batch) File "E:\多模态医疗实体识别\RpBERT-GCN-NER-main\model.py", line 224, in ner_forward outputs = self._bert_forward_with_image(inputs, datas) File "E:\多模态医疗实体识别\RpBERT-GCN-NER-main\model.py", line 170, in _bert_forward_with_image return self.encoder_t( File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "E:\Anaconda\envs\pytorch\lib\site-packages\transformers\models\bert\modeling_bert.py", line 1006, in forward embedding_output = self.embeddings( File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "E:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "E:\Anaconda\envs\pytorch\lib\site-packages\transformers\models\bert\modeling_bert.py", line 238, in forward embeddings += position_embeddings RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1

进程已结束,退出代码1

ivanhe123 commented 6 months ago

There's a way around it, fortunately. We will not parse the words using dependency parsing, rather, connect all words together. This will take some time. Please fork a copy of this github repository, rename it to "Chinese RpBERT-GCN-NER" and send me a link towards the repository.

ivanhe123 commented 6 months ago

just send the link here

liusenling commented 6 months ago

I changed the code reading code to text, label = line.split(" ") is also not OK?

ivanhe123 commented 6 months ago

it is ok. The dataset format is ok now, but the dataset overall is not suited for this application. There's a way around it, fortunately. We will not parse the words using dependency parsing, rather, connect all words together. This will take some time. Please fork a copy of this github repository, rename it to "Chinese RpBERT-GCN-NER" and send a link to the repository in this issue.

liusenling commented 6 months ago

https://github.com/liusenling/Chinese-RpBERT-GCN-NER

ivanhe123 commented 6 months ago

send me a link in which I can download all your training, dev, test and image assets.

liusenling commented 6 months ago

Sorry, the data cannot be shared for the time being because of privacy concerns.

ivanhe123 commented 6 months ago

Then Sorry, I cannot help you without knowing the dataset. You can change the edge_index function so that each chinese character connects to each other in the graph.

liusenling commented 6 months ago

Thank you for your answers and help