Hi,
I've noticed the three datasets used for code search fine-tuning: AdvTest, CosQa and CSN all originate from CodeSearchNet. If this is the case, isn't there a possibility of data overlap? This concern arises from the fact that the unixcoder-base was also pretrained on NL-PL pairs from the CodeSearchNet dataset. Could you please clarify this?
Hi, I've noticed the three datasets used for code search fine-tuning: AdvTest, CosQa and CSN all originate from CodeSearchNet. If this is the case, isn't there a possibility of data overlap? This concern arises from the fact that the unixcoder-base was also pretrained on NL-PL pairs from the CodeSearchNet dataset. Could you please clarify this?
Thanks