Adding st24-gsmarena, st24-ice_hockey, st24-openweather, st24-owid datasets i.e. their inputs and outputs for the dev splits.
Tasks / Content below:
[x] add the data for dev splits
[x] add the outputs for dev splits
[x] subclass the existing datasets with st24 prefix name so they loads
[x] Describe what was necessary to do to add the datasets usage for previewing the datasets and add screenshots.
[x] Describe how the evaluation script can be run on the shared task datasets
Not included:
[ ] Describe how the outputs were generated in quintd
How can you add a dataset if the format is supported? - What we did in this PR?
Add input data to factgenie/data/DATASET_NAME
Add outputs from your model_X to factgenie/outputs/DATASET_NAME/SPLIT_NAME/model_X.json
Subclass the dataloader e.g. we created factgenie/loaders/practicald2t_st24.py where we subclassed four datasets loaders classes.
We registered the classes into DATASET_CLASSES in factgenie/loaders/__init__.py
Assuming you have followed the steps for installing and running in the main README.md you are ready to see the new datasets in factgenie.
How to evaluate the existing outputs?
In the section above, we loaded inputs and outputs for the st24* datasets.
Below, you will see how to obtain the annotations running the factgenie run-llm-eval command and how factgenie visualizes them.
We will name our campaign st24-demo-openweather-dev-llama3. We will choose st24-openweather dataset and its dev split. Our baseline model was zephyr and we will use the config for llama3 factgenie/llm-eval/ollama-llama3.yaml
$ factgenie list-datasets # use this command to list all registered datasets
ice_hockey
gsmarena
openweather
owid
wikidata
logicnlg
dummy
st24-ice_hockey
st24-gsmarena
st24-openweather
st24-owid
Use the webroser UI and create the LLM eval campaign in the browser instead of running factgenie run-llm-eval from CLI. Start and stop the evaluation and adjust the prompt interactively in the config.
Adding
st24-gsmarena
,st24-ice_hockey
,st24-openweather
,st24-owid
datasets i.e. their inputs and outputs for thedev
splits.Tasks / Content below:
Not included:
How can you add a dataset if the format is supported? - What we did in this PR?
factgenie/data/DATASET_NAME
factgenie/outputs/DATASET_NAME/SPLIT_NAME/model_X.json
factgenie/loaders/practicald2t_st24.py
where we subclassed four datasets loaders classes.DATASET_CLASSES
infactgenie/loaders/__init__.py
Assuming you have followed the steps for installing and running in the main
README.md
you are ready to see the new datasets in factgenie.How to evaluate the existing outputs?
In the section above, we loaded inputs and outputs for the
st24*
datasets. Below, you will see how to obtain the annotations running thefactgenie run-llm-eval
command and how factgenie visualizes them.Look at at the arguments for the command:
We will name our campaign
st24-demo-openweather-dev-llama3
. We will choosest24-openweather
dataset and itsdev
split. Our baseline model waszephyr
and we will use the config for llama3factgenie/llm-eval/ollama-llama3.yaml
Putting it all together we will run
To run the command two more steps are needed :
api_url
in the config to the URL where your ollama server is running https://github.com/kasnerz/factgenie/blob/7faf6c75ccc8e57c74a5b1b42922b431f4dacd06/factgenie/llm-eval/ollama-llama3.yaml#L3Once the command is finished, all your examples will have annotations; you can see them in the browser. I committed the annotation so feel free to visit https://quest.ms.mff.cuni.cz/namuddis/factgenie/browse?dataset=st24-openweather&split=dev&example_idx=0 to see them 😉
Note: I forgot to add the factgenie/loaders/practicald2t_st24.py in the PR, sorry for inconvenience.
Debugging tips
DEBUG
if you are developing prompts or you want to monitor annotations https://github.com/kasnerz/factgenie/blob/7faf6c75ccc8e57c74a5b1b42922b431f4dacd06/factgenie/config.yml#L15factgenie run-llm-eval
from CLI. Start and stop the evaluation and adjust the prompt interactively in the config.