microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.55k stars 366 forks source link

About training with custom data #71

Closed mhyeonsoo closed 3 years ago

mhyeonsoo commented 3 years ago

Hello,

Thanks for the nice and clean sources.

I am now trying to implement this to generate my own classifier and came up with the questions. Is there a way to train from scratch without pretrained model configurations? And I'd like to change the number of classes from 2 to 4, is there a way and things to change for this? Thank you!

guoday commented 3 years ago

You can use this repo. https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection

1.Remove the following code to train from scratch. https://github.com/microsoft/CodeXGLUE/blob/6f780cd27d4813e4e4e2366fcfbf76e2790f08ba/Code-Code/Defect-detection/code/run.py#L512-L517

  1. Change from 1 to 4 here. https://github.com/microsoft/CodeXGLUE/blob/6f780cd27d4813e4e4e2366fcfbf76e2790f08ba/Code-Code/Defect-detection/code/run.py#L505

  2. modify sigmoid to softmax and implement loss function by yourself. https://github.com/microsoft/CodeXGLUE/blob/6f780cd27d4813e4e4e2366fcfbf76e2790f08ba/Code-Code/Defect-detection/code/model.py#L25-L30

  3. use np.argmax to output prediction https://github.com/microsoft/CodeXGLUE/blob/6f780cd27d4813e4e4e2366fcfbf76e2790f08ba/Code-Code/Defect-detection/code/run.py#L295 https://github.com/microsoft/CodeXGLUE/blob/6f780cd27d4813e4e4e2366fcfbf76e2790f08ba/Code-Code/Defect-detection/code/run.py#L339

mhyeonsoo commented 3 years ago

Thank you so much. I will try to follow them :)