guillaume-be / rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
https://docs.rs/crate/rust-bert
Apache License 2.0
2.65k stars 215 forks source link

Model bert_uncased_L-12_H-768_A-12 #401

Open edersoncorbari opened 1 year ago

edersoncorbari commented 1 year ago

Hi, congratulations on the project!

I'm trying to port my Python project to Rust, in my case I use the model (bert_uncased_L-12_H-768_A-12)!

Does this model work with rust-bert?

https://huggingface.co/google/bert_uncased_L-12_H-768_A-12

Thanks, EDMC.

guillaume-be commented 1 year ago

Hello @edersoncorbari ,

Yes - this should work using the Bert model in this crate. You would need to convert the weights to the .ot format with the script found in ./utils/convert_model.py (a virtual environment with Pytorch and Numpy is required for the conversion)

edersoncorbari commented 1 year ago

Hi @guillaume-be

Tks! I tried the conversion, follow the steps:

git lfs install git -C resources clone https://huggingface.co/google/bert_uncased_L-12_H-768_A-12 python ./utils/convert_model.py resources/bert_uncased_L-12_H-768_A-12/pytorch_model.bin

The python conversion is created without any problems. But when I run the rust code below, the error occurs:

thread 'main' panicked at 'Could not open configuration file.: Os { code: 2, kind: NotFound, message: "No such file or directory" }', /home/edmc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-bert-0.21.0/src/common/config.rs:40:34 stack backtrace: 0: rust_begin_unwind at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5 1: core::panicking::panic_fmt at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14 2: core::result::unwrap_failed at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5 3: core::result::Result<T,E>::expect at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1046:23 4: rust_bert::common::config::Config::from_file at /home/edmc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-bert-0.21.0/src/common/config.rs:40:17 5: rust_bert::pipelines::sentence_embeddings::builder::SentenceEmbeddingsBuilder::create_model at /home/edmc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-bert-0.21.0/src/pipelines/sentence_embeddings/builder.rs:59:23 6: detection::main at ./src/main.rs:15:17 7: core::ops::function::FnOnce::call_once at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

Code:

let model = SentenceEmbeddingsBuilder::local("resources/bert_uncased_L-12_H-768_A12").with_device(tch::Device::cuda_if_available()).create_model()?;

let sentences = ["this is an example sentence", "each sentence is converted"]; let embeddings = model.encode(&sentences)?; println!("{embeddings:?}"); Ok(())

Any suggestion?

Thanks, EDMC.

guillaume-be commented 1 year ago

The error message seems to indicate the configuration file is missing -- can you check that all the configuration (*.json) files are present in the repository? The dependencies (replacing the pytorch_model.bin file by rust_model.ot) are mostly the same between the Python and Rust versions