Open bhaddow opened 3 months ago
This is the results of local pathing I had on LUMI and needs to be fixed. We should use the build/
directory (and ideally provide a way of installing marian).
Please check PR #49 for proposed changes to codebase and let me know whether the changes look like a feasible improvement of the current situation
I would have suggested a documentation update instead, but this also works. Is the sample pipeline config compatible with the marian installer?
Right now, I am suggesting having marian installation outside of the pipeline run as a part of the OpusPocus installation steps.
Technically, we can also update the GenerateVocabStep, TrainModelStep and TranslateStep to detect during execution, whether Marian installation is present and then running the installation scripts themselves. However, I am worried that in a situation when the marian installation would fail during the step execution a user might get confused.
Another approach would be implementing separate "3rd party software installation" steps which would be run at the beginning of the pipeline (or in a separate pipeline). It would be similar to the software installation targets in the Mozilla's Snakemake pipeline (which also have taken care of software installation)
To get this script to work on our server, I have to disable CUDNN (setting -DUSE_CUDNN=OFF
)
And if I don't set the cudnn version, I cannot set the number of threads to all
Why do we need a script for cpu install? Would anyone train on cpu?
Why do we need a script for cpu install? Would anyone train on cpu? I think it is not a bad idea to have the option there, for example, if someone wants to just try out OpusPocus locally. It could be also useful for translation (there, the CPU version should be usable).
It looks like marian needs to be built in a specific way, different from the instructions in https://marian-nmt.github.io/docs/
bin
to the path you give for marian. So you need abin
directoryspm_train
in this bin directory, so you need to make sure spm is built.Maybe this should be fixed in the documentation?