microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.51k stars 363 forks source link

Doubt related to maximum input length for defect-detection task #131

Closed Tamal-Mondal closed 2 years ago

Tamal-Mondal commented 2 years ago

Hi Team,

Thanks first of all for this amazing repo.

I am using the defect-detection pipeline(https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection) to consider it as a baseline and improve on this.

My doubt is that the maximum input length(block_size parameter) supported by the pipeline is 400, but CodeBert-base supports a maximum input of length 512. So can I just simply use block_size = 512 to incorporate longer inputs?

Regards, Tamal Mondal

celbree commented 2 years ago

Sure. 512 is fine. We set 400 because there are mostly short codes in this dataset.

Tamal-Mondal commented 2 years ago

Thanks, @celbree for the quick response and the clarification.