guillaume-be / rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
https://docs.rs/crate/rust-bert
Apache License 2.0
2.67k stars 216 forks source link

SentenceEmbeddingsModel is not Sync #389

Closed valyagolev closed 1 year ago

valyagolev commented 1 year ago

Is there any fundamental reason SentenceEmbeddingsModel would not be Sync? I'm a bit confused about this.

If there's no such reason, I will try to see what's the reason it's not Sync and fix it when I find time.

guillaume-be commented 1 year ago

Hello, As far as I know the tch::Tensor, thoroughly used in this crate, is Send but not Sync, and therefore all models and pipelines are also not Sync. They should however all be Send, this is enforced by unit test checks in the crate.

valyagolev commented 1 year ago

OK thank you -

I found this explanation here: https://github.com/LaurentMazare/tch-rs/issues/12#issuecomment-873358372 which mentions that ...

some storage may be shared between tensors and lead to unsound behavior.

In practice this is not much of an issue, but doing more operations across thread is likely to trigger more race conditions

probably having Send is already an issue here

which I don't 100% understand (is "Send" then also unsound? what are the conditions for this happening, is this even possible in case of just using downloaded embeddings?). But yeah, I guess I see why one wouldn't want to open this can of worms. Thanks, I'll try to invent some kind of a wrapper for my use