Closed Jay0412 closed 2 years ago
+) I have one more question. As I understand, I think when doing relation extraction this model uses levitated marker. However, in your script about training re, it uses run_re.py file which seems they don't use a levitated marker. I wonder what is the difference between run_re.py and run_levitatedpair.py. Also, could you explain what's the exact meaning of BertForACEBothSub, BertForACEBothOneDropoutSpanSub, BertForACEBothOneDropoutSub, BertForACEBothOneDropoutLeviPair, BertForACEBothOneDropout classes?
run_levitatedpair.py is for ablation study (two pairs of leviteatd marker) BertForACEBothSub, BertForACEBothOneDropoutSpanSub, BertForACEBothOneDropoutSub, BertForACEBothOneDropoutLeviPair, BertForACEBothOneDropout are a serious of attempts, we finally use the default BertForACEBothOneDropoutSub.
Thank you for answering. Okay, then does it means BERTForACEBothOneDropoutSub uses levitated marker for relation extraction? Also, I still want to know about the first question(512 and 1024), could you explain it plz?
Though the sequence length > 512, the postion id ranges from 0-512
Thanks, I understand both of them. As you explained, I also think position id ranges from 0-512. However, when I print out the position id shape in item, it is 1024. I can't understand, how it can be possible that it has a 1024 shape and also others (input_ids, attention and etc.)
Transformer can support any length.
Sorry, I'm a beginner of study NLP... Could you explain in more detail? I don't get it... How transformer can support any length even though, BERT limits the position id ranges from 0-512?
Yes.
Well...so,,, any explnation???
BERT limits the position id ranges from 0-512. But we can use the same postion id for different tokens. For example, set 1024 tokens' postion = 0
Ah..! I understand! Thank you very much for your kind reply :)
As I know, BERT is limit the position embedding as 512. However, when I look at the code, I found position id, input id and etc. have 1024 size. I quite confusing about this concept. Could you explain about the difference above those?