Closed yz-qiang closed 2 years ago
Thanks for your reply. For the second point, I know can use the pipeline to predict the MASK position. But, the MASK in os.path.<mask>
cannot be predicted as isFile()
, because isFile()
is not in the model vocabulary. There have any suggestions to help me fix this problem? Maybe, you will tell me to add isFile()
into the model vocabulary. However, I doubt that a randomly generated vector of embedded words for the new word 'isFile()' would be useful, and I also worry about the Out Of Vocabulary problem. Can you give me some suggestions? Thank you. :)
Yes. CodeBERT can't predict a span like isFile()
. Two suggestions for this: 1) using multiple <mask>
, like os.path.<mask><mask><mask><mask>
. 2) using another models like CodeT5 that can predict a span.
Thank you very much, I will try it. :)
Hi, CodeBert is a nice work, and thank you for opening the code source. I get confused when I use the codeBert for downstream tasks. When a code token is split into different sub-tokens, can codeBert understand the meaning of the code tokens? For example, when I run
tokenizer.tokenize("isFile()")
theisFile()
was be split intois
,File
and()
. At this time, Can codeBert catch the connection between these sub-tokens? Besides, if I need to predict the \<mask> token in the code:os.path.<mask>
, cobeBert can work? Please reply to me at your convenience, Thank you very much. :)