Open DishaJindal opened 4 years ago
Not sure what you mean by document ids!!
Here's a sample script to read the h5df
file
import h5py
import json
filename = "../data/NYT/nyt.test.h5df"
with h5py.File(filename, "r") as f:
a_group_key = list(f.keys())[0]
data = list(f[a_group_key])
res = json.loads(data[0])
Do res.keys()
to see the keys and then use you can extract data on the terminal in the following way:
res['article'][0]
Hi, Thanks for sharing the repo and the dataset. Would it be possible to share the document ids of the documents in the test split ("nyt.test.h5df") of the NYT dataset?