pdasigi / eqqa

1 stars 1 forks source link

TODO #1

Open pdasigi opened 2 years ago

pdasigi commented 2 years ago

High priority:

Medium priority:

Low priority:

After we have an evaluation setup:

PastelBelem8 commented 2 years ago

For the evaluation setup, the following papers can be insightful for the creation of an Answer Equivalence task:

PastelBelem8 commented 2 years ago

Another idea for QE could be to calibrate the final model score by fitting a model that uses the values of the different metrics to compute the model confidence... Since each metric can have different strengths or weaknesses, combining the different models (e.g., via linear combination)