nyu-mll / jiant

jiant is an nlp toolkit
https://jiant.info
MIT License
1.64k stars 297 forks source link

Adding Tasks without editing the code #1161

Open djstrong opened 4 years ago

djstrong commented 4 years ago

Is it possible to add tasks without editing the library code (dynamically by using Python)?

zphang commented 4 years ago

It could theoretically be possible by adding an entry to jiant.tasks.retrieval.TASK_DICT dynamically, but it is not currently a well-supported work-flow.

What task do you have in mind?

djstrong commented 4 years ago

I have many datasets for text or token classification (tagging) in the same format (TSV) but with different labels. However, the labels are fixed in each task class. I guess I could create TSVTextClassificationTask and override TSVTextClassificationTask.LABELS and TSVTextClassificationTask.LABEL_TO_ID, TSVTextClassificationTask.ID_TO_LABEL = labels_to_bimap(TSVTextClassificationTask.LABELS).

However, it doesn't solve the problem because labels are class members. So, labels should be defined in JSON or read from file. And the evaluation scheme also in JSON.

zphang commented 4 years ago

Thanks for your input, it helps us think about how we can refine our API. (Part of the difficulty arises from the distinction between tying tasks to datasets vs. tasks to formats)

For your use-case, I think a good approach would be:

djstrong commented 4 years ago

Thank you. I have implemented it and it is working: https://github.com/djstrong/jiant/tree/tsv Labels and evaluation scheme I am providing in a task config.

            "kwargs": {
                "labels_path": f"{path}/labels.txt",
                "evaluation_scheme": "SimpleAccuracyEvaluationScheme"
            }