guillaume-be / rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
https://docs.rs/crate/rust-bert
Apache License 2.0
2.67k stars 216 forks source link

Add a Dockerfile build for the converter only #423

Open jondot opened 1 year ago

jondot commented 1 year ago

Currently, converting an existing HF model requires having (1) a Rust environment ready, (2) rust-bert repo available and, (3) setting up a Python environment, just for the conversion.

For the use case where

(a) a Rust developer wants to utilize an HF model, they would need a Python environment (b) a data scientist wants to experiment with different models, and a given Rust project that was created for them by Rust devs: they would need a Rust environment, and to set up a rust-bert repo

As it seems, the groups are mostly mutually exclusive.

I've created a Dockerfile, which I think is minimal, that only does the conversion. It:

  1. Builds the Rust project
  2. Sets up a python environment with the prebuilt Rust converter
  3. Takes a conversion command

And so, developers and data scientists need only to depend on Docker, and assuming the image is called rustbert-converter after it was built to only run:

docker run -v "$(pwd)"/<path to model on host>:/model rustbert-converter pytorch_mode.bin

The image expects a /model folder which is shared between the container and the host, where the raw pytorch model files are.

guillaume-be commented 1 year ago

Thank you @jondot - this is great! The model conversion for Python is currently tested in the CI here, would it be possible to add a test using Docker as well? This would ensure everything still works as expected and serves as a nice documentation illustrating how to run conversion in the tests.

jondot commented 1 year ago

Sure, I can try. Do you mean we want to build the docker in the CI, and then run the docker to convert a sample model?

guillaume-be commented 1 year ago

Yes