Open sandeeppilania opened 4 years ago
I have the same issue!
me too
I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $.
What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following:
E11 & E21 -> # E21 & E22 -> $
and then everything worked perfectly.
@bilalghanem Can you share an example of how you converted the training example. Did you change the entire train.tsv at first or are you changing it at as you read through the file in the code
@sandeeppilania I change it in the code.
Simply, in function convert_examples_to_features
, before the line: l = len(tokens_a)
, use .replace
to convert them.
ex.
str.replace('E11', '#')
etc.
@bilalghanem I am asking something silly here sorry about that,
but on line tokens_a = tokenizer.tokenize(example.text_a)
in function convert_examples_to_features
i tried printing out tokens_a and this is what i am able to see:
['the', 'system', 'as', 'described', 'above', 'has', 'its', 'greatest', 'application', 'in', 'an', 'array', '##ed', '[', 'e', '##11', ']', 'configuration', '[', 'e', '##12', ']', 'of', 'antenna', '[', 'e', '##21', ']', 'elements', '[', 'e', '##22', ']']
so i dont how the replace str.replace('E11', '#') would work here
@bilalghanem I am asking something silly here sorry about that, but on line
tokens_a = tokenizer.tokenize(example.text_a)
in functionconvert_examples_to_features
i tried printing out tokens_a and this is what i am able to see:['the', 'system', 'as', 'described', 'above', 'has', 'its', 'greatest', 'application', 'in', 'an', 'array', '##ed', '[', 'e', '##11', ']', 'configuration', '[', 'e', '##12', ']', 'of', 'antenna', '[', 'e', '##21', ']', 'elements', '[', 'e', '##22', ']']
so i dont how the replace str.replace('E11', '#') would work here
sorry, you're right .. before applying the tokenizer. or even when u start reading the data.
Got it,
So basically,
0 the system as described above has its greatest application in an arrayed [E11] configuration [E12] of antenna [E21] elements [E22] 12 whole component 2
should be converted to
0 the system as described above has its greatest application in an arrayed #configuration# of antenna $elements$ 12 whole component 2
Right?
because my understaning is that e11_p = tokens_a.index("#")+1
is looking for just next offset of #.
@sandeeppilania yes, exactly.
And this line specifies the end entity in case its length is more than single word.
e12_p = l-tokens_a[::-1].index("#")+1
@sandeeppilania hi, brother... i have the same issue.. can you share the part of the code where exactly did you made changes to solve the problem..
thanks in advance
I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $.
What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following:
E11 & E21 -> # E21 & E22 -> $
and then everything worked perfectly.
You mean E11 & E12?
I wonder why they didn't try running the software before they posted it here (and explicitly said it's "stable", when it doesn't even run)...
@sandeeppilania hi, brother... i have the same issue.. can you share the part of the code where exactly did you made changes to solve the problem..
thanks in advance
I think the authors was planning to use E11, E21, etc. but then changed the code to use # & $. What I have done to solve the issue is that when I read the data in the beginning of the code, I convert the special tokens as the following: E11 & E21 -> # E21 & E22 -> $ and then everything worked perfectly.
You mean E11 & E12?
Please check the following lines in bert.py. uncomment the line you need.
additional_special_tokens = []
Hey guys, look here!Modify the "additional_special_tokens" in the file "bert.py" so that it corresponds to the file "util.py", and pay attention to the starting and ending subscript positions in the file "util.py"; If necessary, modify about 275 lines of code in the file "util.py". After finishing, I can start training. I have tried this method and it is effective.
伙计,来瞧瞧这!修改bert.py文件中的additional_special_tokens使它和util.py文件中的tokens_a对应,同时注意开始和结束的下标位置;必要情况修改util.py文件中的275行左右的代码。完成后即可开始训练,我已经尝试了这种方法,是有效的。
I am sorry I have no time to correct the code, the error raises when you are using a modern transformer library.
model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens)
add the following line!!!
model.resize_token_embeddings(len(tokenizer))
I am sorry I have no time to correct the code, the error raises when you are using a modern transformers library.
model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens)
add the following line!!!
model.resize_token_embeddings(len(tokenizer))
OK,thank you😲
---- Replied Message ---- | From | @.> | | Date | 06/21/2022 21:38 | | To | @.> | | Cc | @.**@.> | | Subject | Re: [wang-h/bert-relation-classification] Error while trying to use the model (#4) |
I am sorry I have no time to correct the code, the error raises when you are using a modern transformers library. model = XXX.from_pretrained(args.bert_model, args=args) tokenizer.add_tokens(additional_special_tokens) add the following line!!! model.resize_token_embeddings(len(tokenizer))
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Traceback (most recent call last): File "bert.py", line 429, in
main()
File "bert.py", line 373, in main
config, config.task_name, tokenizer, evaluate=False)
File "bert.py", line 268, in load_and_cache_examples
examples, label_list, config.max_seq_len, tokenizer, "classification", use_entity_indicator=confi
g.use_entity_indicator)
File "C:\Users\pilanisp\Desktop\BERT FINAL\BERT IE\bert-relation-classification\utils.py", line 281
, in convert_examples_to_features
e11_p = tokens_a.index("#")+1 # the start position of entity1
ValueError: '#' is not in list