Closed liusenling closed 6 months ago
Please send the whole error, the output before the runtime error.
Also, this model will not work with Chinese Datasets. Is your dataset a Chinese dataset? If so, there need to be minor changes.
Yes,my Datasets is Chinese. I have modified bert's pre-training model and the English model in SpaCy library. Can you tell me what other changes need to be made?
Can you please send me your chinese dataset? The test dataset is ok. I will check if the format is correct.
Also copy the whole error that was outputed into the console and paste it here. I can't see the image.
the format of the dataset should be: WORD/tEntity. Yours is: WORD Entity
Also, for spacy parser, it parses not word by word, but a group of chinese characters, for an example, if it parses this sentence: 作为语言而言,为世界使用人数最多的语言,目前世界有五分之一人口做为母语。 It parses as: 作为 ADP case 语言 NOUN nmod:prep 而言 PART case , PUNCT punct 为 ADP case 世界 NOUN compound:nn 使用 NOUN compound:nn 人数 NOUN nsubj 最多 VERB amod 的 PART mark 语言 NOUN nmod:prep , PUNCT punct 目前 NOUN nmod:tmod 世界 NOUN dep 有 VERB ROOT 五分之一 NUM dep 人口 NOUN dobj 做为 VERB conj 母语 NOUN dobj 。 PUNCT punct
So there is something wrong with the dataset overall.
E:\Anaconda\envs\pytorch\python.exe E:/多模态医疗实体识别/RpBERT-GCN-NER-main/train.py
5%|▌ | 127/2500 [07:42<2:24:03, 3.64s/it]
Traceback (most recent call last):
File "E:\多模态医疗实体识别\RpBERT-GCN-NER-main\train.py", line 63, in
进程已结束,退出代码1
There's a way around it, fortunately. We will not parse the words using dependency parsing, rather, connect all words together. This will take some time. Please fork a copy of this github repository, rename it to "Chinese RpBERT-GCN-NER" and send me a link towards the repository.
just send the link here
I changed the code reading code to text, label = line.split(" ") is also not OK?
it is ok. The dataset format is ok now, but the dataset overall is not suited for this application. There's a way around it, fortunately. We will not parse the words using dependency parsing, rather, connect all words together. This will take some time. Please fork a copy of this github repository, rename it to "Chinese RpBERT-GCN-NER" and send a link to the repository in this issue.
send me a link in which I can download all your training, dev, test and image assets.
Sorry, the data cannot be shared for the time being because of privacy concerns.
Then Sorry, I cannot help you without knowing the dataset. You can change the edge_index function so that each chinese character connects to each other in the graph.
Thank you for your answers and help
RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1