Closed oplatek closed 2 weeks ago
It would be nice to walk through what the Dataset class does and reference all (the most important) places where it is used.
Loading inputs is entirely up to the user on how to implement it which is nice since there are enough examples already 🎉
Now there are examples how to load HuggingFace dataset LogicNLG and example e.g. gsmarena which heavily relies on the parent Dataset class to load it from JSON
I needed to register the new dataloader, name it and provide input loading as well as conversion to html, which is super easy using tinyhtml. You are expected to put the data loaded at factgenie/loaders/your_dataset.py and your input data if any at factgenie/data/your_dataset_name/SPLIT_NAME.json Loads model outputs which from factgenie/outputs/your_dataset_name/model_name.json which has the requireed structure
{
"setup": {"id": "your_model_name or ID", "name": "your_model_name_for_users"},
"generated: [
{"out": "generation of your model for input example 1 which will be visualized"},
{"out": "generation of your model for input example 2 which will be visible in Factgenie"},
...
{"out": "Number of examples should match number of input data in your_dataset!"},
]
}
# factgenie/loaders/logicnlg.py
if __name__ == "__main__":
# Testing the loader
d = LogicnlgTest100Tables()
print(len(d.examples['test']))
factgenie run --host=127.0.0.1 --debugger --reload
and placed breakpoints __import__("ipdb").set_trace()
at places where it crashed and used a permalink for previewing the exactly same example http://127.0.0.1:5000/browse?dataset=logicnlg&split=test&example_idx=0
The custom annotations are:
Instance of #4