Closed 2018211801 closed 1 year ago
After a day of researching the code, I found that the scores you used in your paper seem to be the all_rougeL values from the compute_metrics.py, is that right? And in my opinion, if I want to save costs, I should reduce the number of instances selected in each task class. Because the score of each task is so different.
To answer your first question:
"I would be very appreciated if you could provide the version for GPT-J 6B and GPT-NeoX 20B!"
The scripts for running open-sourced models for inference can be found in ICIL/scripts/decoder. To run GPT-J 6B, and GPT-NeoX 20B, change the model_name_or_path
parameter to the one that fits your interest.
For the second part,
In your paper seem to be the all_rougeL values from the compute_metrics.py, is that right?
The short answer is "No." In compute_metrics.py
, specifially at line 161, you can see the metric that each subtask adopts either RougeL
or Exact Match
, depending on its task category. For Figure 1
, as we report Average performance of 119 evaluation tasks on SUPERNI benchmark., we take average of all the tasks performance, and this would be a mixture of exact match score and RougeL scores.
Finally, you have suggested,
If I want to save costs, I should reduce the number of instances selected in each task class. Because the score of each task is so different.
If I understand you correctly, you believe that since the scores of each task are vastly different, it would be better to decrease the number of instances per task instead of reducing the number of tasks being tested. While this is certainly an option, keep in mind that if you want to compare performance across different models, you would need a sufficient number of instances per task to accurately evaluate each model's performance.
Thanks very much!
I'm a nlp beginner and I'm not familiar with ROUGH, I noticed that there are GPT-J 6B and GPT-NeoX 20Bseveral grading files collect_metric.py, and the run.sh and readme files under ROUGH. They are nonuniform, how do you score the prediction? And I would be very appreciated if you could provide the version for GPT-J 6B and GPT-NeoX 20B!