shuyanzhou / docprompting

Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023
Apache License 2.0
232 stars 17 forks source link

train_retriever_sup_unsup.json #19

Open chenzhongwu opened 6 months ago

chenzhongwu commented 6 months ago

Could you share the code for generating the training data for simcse? What is the difference between text1 and text2 in train_retriever_sup_unsup.json?

chenzhongwu commented 6 months ago

I am very curious about how you generate the files in the provided dataset in https://drive.google.com/file/d/1CzNlo8-e4XqrgAME5zHEWEKIQMPga0xl/view?usp=sharing. What methods did you use to process them from what raw datasets? Thanks!

shuyanzhou commented 6 months ago

What is the difference between text1 and text2 in train_retriever_sup_unsup.json?

In the unsupervised setting, text1 and text2 are the same. In the supervised setting, text2 is the natural language intent from CoNaLa and text1 is the description of the function that fulfill the intent.

What methods did you use to process them from what raw datasets?

CoNaLa provides NL-code pairs. We use heuristics to extract the functions from the code and find their responding documents. Please see Appendix B of the paper for more detailed descriptions.

Let me know if you want to use similar pipelines to generate more NL-doc-code tuples. I can provide some more straightforward approach to generate the data.

chenzhongwu commented 4 months ago

Thanks a lot! I have some other questions confusing me: 1. Could you explain more about Evaluation metrics: character-level BLEU? 2. What is the metric for Retrieval performance in Table 4? How to evaluate the retrieved docs are right? Thanks again!

shuyanzhou commented 4 months ago

character-level BLEU is calculated in this way:

from sacrebleu.metrics import BLEU
bleu = BLEU(tokenize='char')
bleu_score = bleu.corpus_score(pred_list, [src_list]).score
metric_list['bleu_char'] = bleu_score

where pred_list and src_list are list[str]

The code to calculate recall is here. Basically, we select top-k from the retriever and see if the ground-truth is inside the top-k.