Open Ling-JM opened 1 month ago
Hi, I was the one who raised the problem #32 , but the solutions I took didn't actually solve my problem at the time. I finally found that I forgot to change the tokenizer into the right one in two place, like change the codex into codegen https://github.com/microsoft/CodeT/blob/35f54d60b152cc31d134b788e702878ad613d9f7/RepoCoder/run_pipeline.py#L30. https://github.com/microsoft/CodeT/blob/35f54d60b152cc31d134b788e702878ad613d9f7/RepoCoder/run_pipeline.py#L46 I don't know if you are in the same situation. I think this maybe can solve your problem.
Thank you very much for the contributions of the authors. While attempting to implement the RepoCoder method, we encountered an issue in the codegen_inference.py file. When we tried to use the prompts/rg-one-gram-ws-20-ss-2.jsonl file with the codegen-350M-mono model for code generation, we encountered an error.
I also carefully reviewed issue #32 and understood the solution. However, when I reduce the length of the retrieved context or remove context from the starting lines, I cannot reproduce the results you obtained in Table 2 (a, b). Could you provide a more detailed solution or the code you used to achieve those results?