Adding Tasks without editing the code

nyu-mll / jiant

jiant is an nlp toolkit

https://jiant.info

MIT License

1.64k stars 297 forks source link

Adding Tasks without editing the code #1161

Open djstrong opened 4 years ago

djstrong commented 4 years ago

Is it possible to add tasks without editing the library code (dynamically by using Python)?

zphang commented 4 years ago

It could theoretically be possible by adding an entry to jiant.tasks.retrieval.TASK_DICT dynamically, but it is not currently a well-supported work-flow.

What task do you have in mind?

djstrong commented 4 years ago

I have many datasets for text or token classification (tagging) in the same format (TSV) but with different labels. However, the labels are fixed in each task class. I guess I could create TSVTextClassificationTask and override TSVTextClassificationTask.LABELS and TSVTextClassificationTask.LABEL_TO_ID, TSVTextClassificationTask.ID_TO_LABEL = labels_to_bimap(TSVTextClassificationTask.LABELS).

However, it doesn't solve the problem because labels are class members. So, labels should be defined in JSON or read from file. And the evaluation scheme also in JSON.

zphang commented 4 years ago

Thanks for your input, it helps us think about how we can refine our API. (Part of the difficulty arises from the distinction between tying tasks to datasets vs. tasks to formats)

For your use-case, I think a good approach would be:

Implement a generic TSVTextClassificationTask task
Expose LABELS and/or LABEL_TO_ID and ID_TO_LABEL as instance properties, so they can differ across different instances/task configs while using the same task class implementation
Dynamically insert an entry to jiant.tasks.retrieval.TASK_DICT
Have different task-config JSONs for each tagging task, be sure to use different names but the same task base on the TASK_DICT key used above.

djstrong commented 4 years ago

Thank you. I have implemented it and it is working: https://github.com/djstrong/jiant/tree/tsv Labels and evaluation scheme I am providing in a task config.

            "kwargs": {
                "labels_path": f"{path}/labels.txt",
                "evaluation_scheme": "SimpleAccuracyEvaluationScheme"
            }