kasnerz / factgenie

A Toolkit for Annotating and Visualizing LLM Hallucinations
MIT License
1 stars 0 forks source link

add practicald2t sharedtask 2024 (st24 prefixed) datasets #35

Closed oplatek closed 4 days ago

oplatek commented 6 days ago

Adding st24-gsmarena, st24-ice_hockey, st24-openweather, st24-owid datasets i.e. their inputs and outputs for the dev splits.

Tasks / Content below:

Not included:

How can you add a dataset if the format is supported? - What we did in this PR?

  1. Add input data to factgenie/data/DATASET_NAME
  2. Add outputs from your model_X to factgenie/outputs/DATASET_NAME/SPLIT_NAME/model_X.json
  3. Subclass the dataloader e.g. we created factgenie/loaders/practicald2t_st24.py where we subclassed four datasets loaders classes.
  4. We registered the classes into DATASET_CLASSES in factgenie/loaders/__init__.py

Assuming you have followed the steps for installing and running in the main README.md you are ready to see the new datasets in factgenie.

Screenshot 2024-07-01 at 19 18 42

How to evaluate the existing outputs?

In the section above, we loaded inputs and outputs for the st24* datasets. Below, you will see how to obtain the annotations running the factgenie run-llm-eval command and how factgenie visualizes them.

Look at at the arguments for the command:

factgenie run-llm-eval --campaign_name $YOUR_CAMPAIGN_NAME --dataset_name $DATASET --split $SPLIT --llm_output_name $SUMMARY_GENERATING_MODEL --llm_metric_config factgenie/llm-eval/$LLM_EVALUATOR

We will name our campaign st24-demo-openweather-dev-llama3. We will choose st24-openweather dataset and its dev split. Our baseline model was zephyr and we will use the config for llama3 factgenie/llm-eval/ollama-llama3.yaml

$ factgenie list-datasets     # use this command to list all registered datasets
ice_hockey
gsmarena
openweather
owid
wikidata
logicnlg
dummy
st24-ice_hockey
st24-gsmarena
st24-openweather
st24-owid

Putting it all together we will run

factgenie run-llm-eval --campaign_name st24-demo-openweather-dev-llama3 --dataset_name st24-openweather --split dev --llm_output_name mistral --llm_metric_config factgenie/llm-eval/ollama-llama3.yaml

To run the command two more steps are needed :

  1. Setup ollama server with llama3 model
  2. Update the URL in the config to your ollama server. Just change the api_url in the config to the URL where your ollama server is running https://github.com/kasnerz/factgenie/blob/7faf6c75ccc8e57c74a5b1b42922b431f4dacd06/factgenie/llm-eval/ollama-llama3.yaml#L3

Once the command is finished, all your examples will have annotations; you can see them in the browser. I committed the annotation so feel free to visit https://quest.ms.mff.cuni.cz/namuddis/factgenie/browse?dataset=st24-openweather&split=dev&example_idx=0 to see them 😉

Note: I forgot to add the factgenie/loaders/practicald2t_st24.py in the PR, sorry for inconvenience.

Debugging tips

  1. Change the logging level to DEBUG if you are developing prompts or you want to monitor annotations https://github.com/kasnerz/factgenie/blob/7faf6c75ccc8e57c74a5b1b42922b431f4dacd06/factgenie/config.yml#L15
  2. Use the webroser UI and create the LLM eval campaign in the browser instead of running factgenie run-llm-eval from CLI. Start and stop the evaluation and adjust the prompt interactively in the config.