Closed tuvllms closed 8 months ago
Hi @tuvllms
I used the test set of the wiki_bio dataset as you point to in the link (same one).
wiki_bio_test_idx
indicates the ID of the item. For example, you can get the original data by doing:
from datasets import load_dataset
dataset = load_dataset("wiki_bio")
item = dataset['test'][wiki_bio_test_idx]
Thanks, @potsawee!
I see what you meant. So each wiki_bio_test_idx
here is actually a row index for the data frame dataset['test']
. Note that the original wiki_bio
dataset also included a set of test ids for their test examples, which are different from these wiki_bio_test_idx
ids. If you download their data https://huggingface.co/datasets/wiki_bio/blob/main/data/wikipedia-biography-dataset.zip you can find their test ids in wikipedia-biography-dataset/test/test.id
.
Yes right wiki_bio_test_idx
indicates the row index for dataset['test']
. Thank you for pointing out about test ids!
Hello,
Which version of the wikibio dataset did you use? I can't find the wiki_bio_test_idx indices in the
wikipedia-biography-dataset/test/test.id
file herehttps://huggingface.co/datasets/wiki_bio/blob/main/data/wikipedia-biography-dataset.zip.