If you think what I asking and talking about in public is unfair for this assignment, you should delete this issue. Sorry for any inconvenience caused.
I want to double check this assignment flow. After reading and reviewing the code, I am not sure whether my thought is right or not.
using bge model to calculate the similarity between train and test and choosing the most K samples from training dataset as in context.
Use phi1.5 and the generated prompt from step 1, to generate the result.
What's more, it's too confused about the readme file. Especially for these command you wrote:
I think when we submit the assignment, we should change data-path to ARC-Easy-test.jsonl. Is it right?
If it's right, after finishing "write your code" and before releasing test dataset, we want to test whether the code could run, there has an issue that the file doesn't contain "test" in 235 line eval_fewshot.py,
When we validate the model, it should be "validation", when we test the model, it should be "test".
Just changing demonstrations = load_all_demonstrations(args.data_path.replace("test", "train")) to demonstrations = load_all_demonstrations(args.data_path.replace("test", "train").replace("validation", "train"))
The performance change is as follows:
Thank you for highlighting this issue. Your grasp of the submission pipeline is accurate. I appreciate your advice and will address the bug accordingly.
If you think what I asking and talking about in public is unfair for this assignment, you should delete this issue. Sorry for any inconvenience caused.
I want to double check this assignment flow. After reading and reviewing the code, I am not sure whether my thought is right or not.
What's more, it's too confused about the readme file. Especially for these command you wrote:
I think when we submit the assignment, we should change
data-path
toARC-Easy-test.jsonl
. Is it right? If it's right, after finishing "write your code" and before releasing test dataset, we want to test whether the code could run, there has an issue that the file doesn't contain "test" in 235 line eval_fewshot.py,When we validate the model, it should be "validation", when we test the model, it should be "test".
Just changing
demonstrations = load_all_demonstrations(args.data_path.replace("test", "train"))
todemonstrations = load_all_demonstrations(args.data_path.replace("test", "train").replace("validation", "train"))
The performance change is as follows: