hint-lab / bert-relation-classification

A pytorch implementation of BERT-based relation classification
152 stars 25 forks source link

Error while trying to use the model #4

Open sandeeppilania opened 4 years ago

sandeeppilania commented 4 years ago

Traceback (most recent call last): File "bert.py", line 429, in main() File "bert.py", line 373, in main config, config.task_name, tokenizer, evaluate=False) File "bert.py", line 268, in load_and_cache_examples examples, label_list, config.max_seq_len, tokenizer, "classification", use_entity_indicator=confi g.use_entity_indicator) File "C:\Users\pilanisp\Desktop\BERT FINAL\BERT IE\bert-relation-classification\utils.py", line 281 , in convert_examples_to_features e11_p = tokens_a.index("#")+1 # the start position of entity1 ValueError: '#' is not in list

bilalghanem commented 4 years ago

I have the same issue!

vpvsankar commented 4 years ago

me too

bilalghanem commented 4 years ago

I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $.

What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following:

E11 & E21 -> # E21 & E22 -> $

and then everything worked perfectly.

sandeeppilania commented 4 years ago

@bilalghanem Can you share an example of how you converted the training example. Did you change the entire train.tsv at first or are you changing it at as you read through the file in the code

bilalghanem commented 4 years ago

@sandeeppilania I change it in the code. Simply, in function convert_examples_to_features, before the line: l = len(tokens_a), use .replace to convert them.

ex.

str.replace('E11', '#')
etc.
sandeeppilania commented 4 years ago

@bilalghanem I am asking something silly here sorry about that, but on line tokens_a = tokenizer.tokenize(example.text_a) in function convert_examples_to_features i tried printing out tokens_a and this is what i am able to see: ['the', 'system', 'as', 'described', 'above', 'has', 'its', 'greatest', 'application', 'in', 'an', 'array', '##ed', '[', 'e', '##11', ']', 'configuration', '[', 'e', '##12', ']', 'of', 'antenna', '[', 'e', '##21', ']', 'elements', '[', 'e', '##22', ']'] so i dont how the replace str.replace('E11', '#') would work here

bilalghanem commented 4 years ago

@bilalghanem I am asking something silly here sorry about that, but on line tokens_a = tokenizer.tokenize(example.text_a) in function convert_examples_to_features i tried printing out tokens_a and this is what i am able to see: ['the', 'system', 'as', 'described', 'above', 'has', 'its', 'greatest', 'application', 'in', 'an', 'array', '##ed', '[', 'e', '##11', ']', 'configuration', '[', 'e', '##12', ']', 'of', 'antenna', '[', 'e', '##21', ']', 'elements', '[', 'e', '##22', ']'] so i dont how the replace str.replace('E11', '#') would work here

sorry, you're right .. before applying the tokenizer. or even when u start reading the data.

sandeeppilania commented 4 years ago

Got it, So basically, 0 the system as described above has its greatest application in an arrayed [E11] configuration [E12] of antenna [E21] elements [E22] 12 whole component 2 should be converted to 0 the system as described above has its greatest application in an arrayed #configuration# of antenna $elements$ 12 whole component 2 Right? because my understaning is that e11_p = tokens_a.index("#")+1 is looking for just next offset of #.

bilalghanem commented 4 years ago

@sandeeppilania yes, exactly.

And this line specifies the end entity in case its length is more than single word. e12_p = l-tokens_a[::-1].index("#")+1

ejokhan commented 4 years ago

@sandeeppilania hi, brother... i have the same issue.. can you share the part of the code where exactly did you made changes to solve the problem..

thanks in advance

Valdegg commented 4 years ago

I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $.

What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following:

E11 & E21 -> # E21 & E22 -> $

and then everything worked perfectly.

You mean E11 & E12?

Valdegg commented 4 years ago

I wonder why they didn't try running the software before they posted it here (and explicitly said it's "stable", when it doesn't even run)...

wang-h commented 4 years ago

@sandeeppilania hi, brother... i have the same issue.. can you share the part of the code where exactly did you made changes to solve the problem..

thanks in advance

I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $. What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following: E11 & E21 -> # E21 & E22 -> $ and then everything worked perfectly.

You mean E11 & E12?

Please check the following lines in bert.py. uncomment the line you need.

additional_special_tokens = ["[E11]", "[E12]", "[E21]", "[E22]"]

additional_special_tokens = []

additional_special_tokens = ["e11", "e12", "e21", "e22"]

seesky8848 commented 2 years ago

Hey guys, look here!Modify the "additional_special_tokens" in the file "bert.py" so that it corresponds to the file "util.py", and pay attention to the starting and ending subscript positions in the file "util.py"; If necessary, modify about 275 lines of code in the file "util.py". After finishing, I can start training. I have tried this method and it is effective.

伙计,来瞧瞧这!修改bert.py文件中的additional_special_tokens使它和util.py文件中的tokens_a对应,同时注意开始和结束的下标位置;必要情况修改util.py文件中的275行左右的代码。完成后即可开始训练,我已经尝试了这种方法,是有效的。

wang-h commented 2 years ago

I am sorry I have no time to correct the code, the error raises when you are using a modern transformer library. model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens) add the following line!!!
model.resize_token_embeddings(len(tokenizer))

wang-h commented 2 years ago

I am sorry I have no time to correct the code, the error raises when you are using a modern transformers library. model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens) add the following line!!!
model.resize_token_embeddings(len(tokenizer))

seesky8848 commented 2 years ago

OK,thank you😲

---- Replied Message ---- | From | @.> | | Date | 06/21/2022 21:38 | | To | @.> | | Cc | @.**@.> | | Subject | Re: [wang-h/bert-relation-classification] Error while trying to use the model (#4) |

I am sorry I have no time to correct the code, the error raises when you are using a modern transformers library. model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens) add the following line!!! model.resize_token_embeddings(len(tokenizer))

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>