shmsw25 / FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
https://arxiv.org/abs/2305.14251
MIT License
275 stars 40 forks source link

About the enwiki-20230401 #44

Open Toblame opened 7 months ago

Toblame commented 7 months ago

after download the data and set the environment, I run this command python -m factscore.factscorer --input_path "/root/FNDLLM/test.jsonl" --model_name "retrieval+llama+npm" --use_atomic_facts --data_dir '/root/.cache/factscore/ and get this File "/root/anaconda3/envs/factstore/lib/python3.7/site-packages/factscore/retrieval.py", line 57, in build_db with open(data_path, "r") as f:'FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/factscore/enwiki-20230401.jsonl' I didn't find the enwiki-20230401.jsonl in the download data, where is it?

martiansideofthemoon commented 7 months ago

Hi @Toblame, thanks for your interest in our work. What command did you use to download the data?

The cache is stored by default in the folder where you ran the download command, see https://github.com/shmsw25/FActScore/blob/main/factscore/download_data.py#L119

Can you confirm that the other cache files are present in /root/.cache for you?

Toblame commented 7 months ago

Thank you and I have solve this problem, however I meet another problem 'AssertionError: topic in your data (topic) is likely to be not a valid title in the DB.' This happened when I used both my own data and the factscore labeled data.

tanay2001 commented 7 months ago

Hi @Toblame ,

How did u solve this problem? The download_data.py file only downloads a enwiki-20230401.db file, I cannot find a .jsonl file in the cache. TIA

Toblame commented 7 months ago

Hi @Toblame ,

How did u solve this problem? The download_data.py file only downloads a enwiki-20230401.db file, I cannot find a .jsonl file in the cache. TIA

I just restart the command and check the cache file's location, then run the command again. However I still meet another problem above.

martiansideofthemoon commented 6 months ago

Hi @Toblame,

Thank you and I have solve this problem, however I meet another problem 'AssertionError: topic in your data (topic) is likely to be not a valid title in the DB.'

You are likely getting this error because you have set topic in some rows of the input JSONL file to the string "topic". For this to work, topic must be equal to some article title (like "Billy Conigliaro") which is present in the database.