Closed thefirebanks closed 3 years ago
You have done a magnificent work!! Some things that I need to clarify:
You have done a magnificent work!! Some things that I need to clarify:
- If we had the example file "input/sample_model_output.json" it would be easier to execute the code in the notebook
- If I understand it right you assume that, for each document we will have two files with labelled sentences, the sample_dataset and the sample_model_output. They are going to have the same sentences in the same order and then different labels. I'm afraid we may have some mess if we do not check that the sentences are actually the same and that they are at the same position.
- If we are going trhough the sample_dataset.json as we do it in the function "labeled_sentences_from_dataset" we are assuming that all sentences will fall into one of the categories 0 to 5, while most of them will fall into a -1 category which is "no_incentive". We will talk about it.
Hi Jordi, thank you for the feedback! Here are my responses:
Updated the input files in the google drive folder, will update the data loader tomorrow before midnight EST!
Tried loading sentences from ElSalvador.json
and they got effectively loaded!
Main changes:
tasks/data_loader/src/
to load data from the dataset and model output json files.tasks/evaluate_model/src
to evaluate classification models and easily visualize the results. Documented notebook is intasks/evaluate_model/notebooks
Bonus:
tasks/
folder so that whenever we need to create a new task, we can use it and it will automatically create the folder structure (including the input/output/src folders). More details in the README.md of thetasks/
folder.