Add a Dockerfile build for the converter only

jondot commented 1 year ago

Currently, converting an existing HF model requires having (1) a Rust environment ready, (2) rust-bert repo available and, (3) setting up a Python environment, just for the conversion.

For the use case where

(a) a Rust developer wants to utilize an HF model, they would need a Python environment (b) a data scientist wants to experiment with different models, and a given Rust project that was created for them by Rust devs: they would need a Rust environment, and to set up a rust-bert repo

As it seems, the groups are mostly mutually exclusive.

I've created a Dockerfile, which I think is minimal, that only does the conversion. It:

Builds the Rust project
Sets up a python environment with the prebuilt Rust converter
Takes a conversion command

And so, developers and data scientists need only to depend on Docker, and assuming the image is called rustbert-converter after it was built to only run:

docker run -v "$(pwd)"/<path to model on host>:/model rustbert-converter pytorch_mode.bin

The image expects a /model folder which is shared between the container and the host, where the raw pytorch model files are.

guillaume-be commented 1 year ago

Thank you @jondot - this is great! The model conversion for Python is currently tested in the CI here, would it be possible to add a test using Docker as well? This would ensure everything still works as expected and serves as a nice documentation illustrating how to run conversion in the tests.

jondot commented 1 year ago

Sure, I can try. Do you mean we want to build the docker in the CI, and then run the docker to convert a sample model?

guillaume-be commented 1 year ago

Yes

guillaume-be / rust-bert

Add a Dockerfile build for the converter only #423