hplt-project / OpusPocus

Marian machine translation training pipeline for thousands of models
2 stars 0 forks source link

Improve documentation #35

Open bhavitvyamalik opened 4 months ago

bhavitvyamalik commented 4 months ago

We need to have detailed documentation for running OpusPocus on both slurm and bash. The current documentation is good for setting up OpusPocus but there are few gaps when running the pipeline.

bhaddow commented 4 months ago

Could you give step-by-step instructions for "preparing the virtual environments for OpusCleaner and OpusPocus". Also what to do if you want to use conda instead?

varisd commented 4 months ago

I use the following:

$ /path/to/virtualenv -p /path/to/python-3.10.X/bin/python3 /virtual/environment/destination/opuspocus
$ source /virtual/environment/destination/opuspocus/bin/activate
$ pip install --upgrade pip setuptools
$ pip install -r requirements.txt

$ /path/to/virtualenv -p /path/to/python-3.10.X/bin/python3 /virtual/environment/destination/opuscleaner
$ source /virtual/environment/destination/opuscleaner/bin/activate
$ pip install --upgrade pip setuptools
$ pip install -r requirements-opuscleaner.txt 

Get the requirements-opuscleaner.txt from the opuscleaner repository. Alternatively, you can try replacing pip install -r requirements-opuscleaner.txt with pip install opuscleaner in the opuscleaner vitualenv. If I remember correctly, the approach should be similar to conda, only the environment activation differs. For execution you pass the root directory of the virtual environment and the respective python executable is called by opuspocus (should not matter if its virtualenv or conda - but it needs testing for confirmation).

bhaddow commented 4 months ago

I meant, should this be in the documentation. It's maybe worth spelling out.

varisd commented 4 months ago

We can put it in README.txt in the installation section. Although, ideally, OpusCleaner and OpusTrainer (the pipy versions) should not have conflicting dependencies, thus requiring only a single virtual env for OpusPocus (which has OpusTrainer as its pipy dependency).

bhaddow commented 4 months ago

Yes, it would be ideal if we could remove the conflicts. I think then we would need some automated check (CI pipeline) to ensure that conflicts were not re-introduced, assuming both projects will be actively developed.

bhaddow commented 4 months ago

This is incomplete

$ /path/to/virtualenv -p /path/to/python-3.10.X/bin/python3 /virtual/environment/destination/opuspocus
$ source /virtual/environment/destination/opuspocus/bin/activate
$ pip install --upgrade pip setuptools
$ pip install -r requirements.txt

You also need to run pip install . in the OpusPocus directory. (which fails). See #42