hitz-zentroa / GoLLIE

Guideline following Large Language Model for Information Extraction
https://hitz-zentroa.github.io/GoLLIE/
Apache License 2.0
263 stars 18 forks source link

How to construct Diann cropus #22

Closed TuRan-sino closed 1 month ago

TuRan-sino commented 1 month ago

I'm trying to download the Diann corpus from the link you provided DIANN. However, I couldn't find any download link on that website. Therefore, I searched on GitHub and downloaded the Diann corpus provided by gildofabregat/DIANN-IBEREVAL-2018 After unzipping the file, I found that the files were all .txt files instead of .tsv files. Did I download the wrong dataset, or is there some preprocessing script that I haven't used?

ikergarcia1996 commented 1 month ago

Hello @TuRan-sino

You can obtain the .tsv file by processing the .txt files, which are annotated with HTML tags. To save you the hassle, here is the preprocessed dataset.

diann.zip

TuRan-sino commented 1 month ago

Thank you for your help