Open Ahmedfir opened 1 year ago
Do you mean pre-training code for CodeBERT-MLM model?
Yes, the list of projects that have been used for pre-training the model CodeBERT-MLM.
For instance, when I load with Roberta
the model CodeBERT-MLM like so RobertaForMaskedLM.from_pretrained("microsoft/codebert-base-mlm")
, on which projects this model has been pre-trained?
This way, we can know which code we can consider as seen or as unseen during the pre-training of CodeBERT-MLM.
We don't release pre-training code. We only use train split to pre-train CodeBERT (https://huggingface.co/datasets/code_search_net).
Thank you very much.
Dear Madame or Sir,
Could you please provide us with a list of the projects that have been used for the training (including the evaluation) of CodeBERT? Particularly the CodeBERT-MLM task? From the paper, I understand that you have used the dataset provided by the CodeSearchNet challenge. But I could not find the information on which projects or what is used for training and what has been excluded. I see that for each described task/pipeline in the Readme.me, you have a specific folder for it with the corresponding training and evaluation datasets, except for the CodeBERT-MLM. Could you please help me in finding this information? Any help or guidance is welcome.
Thank you in advance and best regards!