Modular NLP pipeline manager.
OpusPocus is aimed at simplifying the description and execution of popular and custom NLP pipelines, including dataset preprocessing, model training and evaluation. The pipeline manager supports execution using simple CLI (Bash) or common HPC schedulers (Slurm, HyperQueue).
It uses OpusCleaner for data preparation and OpusTrainer for training scheduling (development in progress).
go.py
- pipeline manager entry scriptopuspocus/
- OpusPocus modulesconfig/
- default configuration files (pipeline config, marian training config, ...)examples/
- pipeline manager usage examplesscripts/
- helper scripts, at this moment not directly implemented in OpusPocustests/
- unit testsInstall MarianNMT.
Prepare the OpusCleaner and OpusTrainer Python virtual environments.
Install the OpusPocus requirements.
pip install -r requirements.txt
See the examples/
directory for example execution
Initialize the pipeline.
$ ./go.py init \
--pipeline-config path/to/pipeline/config/file \
--pipeline-dir pipeline/destination/directory \
Execute the pipeline.
$ ./go.py run \
--pipeline-dir pipeline/destination/directory \
--runner bash \
Check the pipeline status.
$ ./go.py traceback --pipeline-dir pipeline/destination/directory
OR
$ ./go.py status --pipeline-dir pipeline/destination/directory