Question about text-code evaluation

I trained the CodeBert model using CodeSearchNet on nl-code-search-webquery and fine-tuned it on Cosqa. But I have no access to the test_webquery.json file in the command from https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-WebQuery#readme:

python code/run_classifier.py \
            --model_type roberta \
            --do_predict \
            --test_file test_webquery.json \
            --max_seq_length 200 \
            --per_gpu_eval_batch_size 2 \
            --data_dir ./data \
            --output_dir ./model_cosqa_continue_training/checkpoint-best-aver/ \
            --encoder_name_or_path microsoft/codebert-base \
            --pred_model_dir ./model_cosqa_continue_training/checkpoint-last/ \
            --prediction_file ./evaluator/webquery_predictions.txt

I tried dev_webquery. json (from CodeSearchNet) and cosqa_dev. json. The validation results on cosqa_dev. json are similar to those in readme, but the accuracy of dev_webquery. json is very low. Additionally, the accuracy results in readme are inconsistent with those in the paper. Is the evaluation result in the paper obtained on CodeSearchNet or cosqa?

microsoft / CodeXGLUE

Question about text-code evaluation #183