SIL NLP provides a set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor and minority languages.
These are the main requirements for the SILNLP code to run on a local machine. Since there are many Python packages that need to be used with complex versioning requirements, we use a Python package called Poetry to mangage all of those. So here is a rough heirarchy of SILNLP with the major dependencies.
Requirement | Reason |
---|---|
GIT | to get the repo from github |
Python | to run the silnlp code |
Poetry | to manage all the Python packages and versions |
NVIDIA GPU | Required to run on a local machine |
Nvidia drivers | Required for the GPU |
CUDA Toolkit | Required for the Machine learning with the GPU |
Environment variables | To tell SILNLP where to find the data, etc. |
If using a local GPU, install the corresponding NVIDIA driver
On Ubuntu, the driver can alternatively be installed through the GUI by opening Software & Updates, navigating to Additional Drivers in the top menu, and selecting the newest NVIDIA driver with the labels proprietary and tested.
After installing the driver, reboot your system.
Download and install Docker Desktop
sudo usermod -aG docker $USER
If using a local GPU, you'll also need to install the NVIDIA Container Toolkit and configure Docker so that it can use the NVIDIA Container Runtime.
Pull Docker image
In a terminal, run:
docker pull ghcr.io/sillsdev/silnlp:latest
Create Docker container based on the image
If you're using a local GPU, then in a terminal, run:
docker create -it --gpus all --name silnlp ghcr.io/sillsdev/silnlp:latest
Otherwise, run:
docker create -it --name silnlp ghcr.io/sillsdev/silnlp:latest
A docker container should be created. You should be able to see a container named 'silnlp' on the Containers page of Docker Desktop.
Create file for environment variables
Create a text file with the following content and edit as necessary:
CLEARML_API_HOST="https://api.sil.hosted.allegro.ai"
CLEARML_API_ACCESS_KEY=xxxxx
CLEARML_API_SECRET_KEY=xxxxx
AWS_REGION="us-east-1"
AWS_ACCESS_KEY_ID=xxxxx
AWS_SECRET_ACCESS_KEY=xxxxx
SIL_NLP_DATA_PATH="/silnlp"
Start container
In a terminal, run:
docker start silnlp
docker exec -it --env-file path/to/env_vars_file silnlp bash
root@xxxxx:~/silnlp#
, where xxxxx
is a string of letters and numbers, instead of your current working directory. This is the command line for the docker container, and you're able to run SILNLP scripts from here.exit
, and to stop it, run docker stop silnlp
. It can be started again by repeating step 6. Stopping the container will not erase any changes made in the container environment, but removing it will.If using a local GPU, install the corresponding NVIDIA driver
On Ubuntu, the driver can alternatively be installed through the GUI by opening Software & Updates, navigating to Additional Drivers in the top menu, and selecting the newest NVIDIA driver with the labels proprietary and tested.
After installing the driver, reboot your system.
Clone the silnlp repo
Install and initialize Miniconda
Create the silnlp conda environment
conda env create --file "environment.yml"
Activate the silnlp conda environment
conda activate silnlp
Install Poetry with the official installer
curl -sSL https://install.python-poetry.org | python3 - --version 1.7.1
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py - --version 1.7.1
Configure Poetry to use the active Python
poetry config virtualenvs.prefer-active-python true
Install the Python packages for the silnlp repo
poetry install
If using ClearML and/or AWS, set the following environment variables:
CLEARML_API_HOST="https://api.sil.hosted.allegro.ai"
CLEARML_API_ACCESS_KEY=xxxxx
CLEARML_API_SECRET_KEY=xxxxx
AWS_REGION="us-east-1"
AWS_ACCESS_KEY_ID=xxxxx
AWS_SECRET_ACCESS_KEY=xxxxx
SIL_NLP_DATA_PATH="/silnlp"
If using AWS, there are two options:
Follow the instructions below to set up a Dev Container in VS Code. This is the recommended way to develop in SILNLP. For manual setup, see Manual Setup.
If using a local GPU, install the corresponding NVIDIA driver.
Download and install Docker Desktop.
Windows (non-WSL) and macOS:
WSL:
Linux:
sudo usermod -aG docker $USER
Set up ClearML.
Define environment variables.
Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY. Additionally, set AWS_REGION. The typical value is "us-east-1".
Linux / macOS users: To set environment variables permanently, add each variable as a new line to the .bashrc
file (Linux) or .profile
file (macOS) in your home directory with the format
export VAR="VAL"
Close and reopen any open terminals for the changes to take effect.
Windows:
Install Visual Studio Code.
Clone the silnlp repo.
Open up silnlp folder in VS Code.
Install the Dev Containers extension for VS Code.
Build the dev container and open the silnlp folder in the container.
gpus --all
part of the runArgs
field of the .devcontainer/devcontainer.json
file.Install and activate Poetry environment.
poetry install
to install the necessary Python libraries, and then run poetry shell
to enter the environment in the terminal. (Optional) Locally mount the S3 bucket. This will allow you to interact directly with the S3 bucket from your local terminal (outside of the dev container). See instructions here.
To get back into the dev container and poetry environment each subsequent time, open the silnlp folder in VS Code, select the "Reopen in Container" option from the Remote Connection menu (bottom left corner), and use the poetry shell
command in the terminal.
See the wiki for information on setting up and running experiments. The most important pages for getting started are the ones on file structure, model configuration, and running experiments. A lot of the instructions are specific to NMT, but are still helpful starting points for doing other things like alignment.
See this page for information on using the VS code debugger.
If you need to use a tool that is supported by SILNLP but is not installable as a Python library (which is probably the case if you get an error like "RuntimeError: eflomal is not installed."), follow the appropriate instructions here.
If you need to run the .NET versions of the Machine alignment models, you will need to install .NET Core SDK 8.0. After installing, run dotnet tool restore
.