Closed molbap closed 1 year ago
This adds a breaking change where we now have to specify which task we are launching in the main interface.
I think this is up for merge. @rwightman the main change is that you have to specify which task you want to launch. For training it's
python -m pixparse.app.train \
--eval.task-name cruller_pretrain \
And for instance to launch an eval
python -m pixparse.app.eval \
--eval.task-name cruller_eval_ocr \
amending that... refactoring the task_factory method. Not ready to merge yet.
This PR has drifted a bit. Added a common task-name argument and a TaskFactory.
class TaskFactory:
"""
This class registers existing tasks and propagates corresponding configurations.
"""
TASK_CONFIG_REGISTRY = {
'cruller_eval_ocr': TaskCrullerEvalOCRCfg,
'donut_eval_ocr': TaskDonutEvalOCRCfg,
'cruller_pretrain': TaskCrullerPretrainCfg
}
TASK_CLASS_REGISTRY = {
'cruller_eval_ocr': TaskCrullerEvalOCR,
'donut_eval_ocr': TaskDonutEvalOCR,
'cruller_pretrain': TaskCrullerPretrain
}
Makes the interfaces
app/train.py
app/eval.py
more generic and less specific to cruller_pretrain.
So that for pretraining cruller one has to now add
python -m pixparse.app.train \
--task-name cruller_pretrain \
tested on the cluster.
This PR has drifted a bit. Added a common task-name argument and a TaskFactory.
class TaskFactory: """ This class registers existing tasks and propagates corresponding configurations. """ TASK_CONFIG_REGISTRY = { 'cruller_eval_ocr': TaskCrullerEvalOCRCfg, 'donut_eval_ocr': TaskDonutEvalOCRCfg, 'cruller_pretrain': TaskCrullerPretrainCfg } TASK_CLASS_REGISTRY = { 'cruller_eval_ocr': TaskCrullerEvalOCR, 'donut_eval_ocr': TaskDonutEvalOCR, 'cruller_pretrain': TaskCrullerPretrain }
- 2 class methods to generate a task configuration, and a task. New tasks (finetuning, warmup, etc) have to be added to the factory, then, specific methods should be defined within the new task specific classes.
Makes the interfaces
app/train.py app/eval.py
more generic and less specific to cruller_pretrain.
So that for pretraining cruller one has to now add
python -m pixparse.app.train \ --task-name cruller_pretrain \
tested on the cluster.
Cool, that's looking good so far, looks like we may end up merging the task config & class registry? Or perhaps the task class should have a classmethod to get the config class type?
For now it looks like TaskCfg and Task have always a 1:1 match. The two classes have different roles and their merit in existing independently, but the factory + create_task can instantiate one task and its config.
Now the registry is
TASK_CLASS_REGISTRY = {
'cruller_eval_ocr': (TaskCrullerEvalOCR, TaskCrullerEvalOCRCfg),
'donut_eval_ocr': (TaskDonutEvalOCR, TaskDonutEvalOCRCfg),
'cruller_pretrain': (TaskCrullerPretrain, TaskCrullerPretrainCfg)
}
tested on the cluster for cruller_pretrain and cruller_eval_ocr.
What does this PR do?