Closed linkedlist771 closed 2 months ago
Hello,
I would recommend checking https://github.com/epwalsh/rust-dl-webserver as an example of creating an application sharing a model to serve requests from multiple threads. Tensors are indeed not Sync
and cannot be shared between threads safely (limitation of the upstream tch-rs library)
I am trying to build a web service with the rust-bert to provide the embedding service, which is totally fine in Python(I have implemented once ), now I am using the
SentenceEmbeddingsModel
model to provide the service. I want to load it once to be global to provide the service, here is what I encountered :I have also checked similar issues like issue47, which I also failed to use it. I think it would be a very common use for the bert-like model to act as a global resource when doing inference with it. I would appreciate if you could support this feature.
For full code:
```rust use actix_web::{ body::BoxBody, get, http::header::ContentType, middleware::Logger, post, web, App, HttpRequest, HttpResponse, HttpServer, Responder, }; use clap::{App as ClapApp, Arg}; use serde::{Deserialize, Serialize}; use serde_json::json; use std::env; use std::collections::HashMap; use std::sync::Arc; mod utils; use utils::{get_model_infos, get_prompt_tokens, load_models, ModelInfo}; use rust_bert::pipelines::sentence_embeddings::{ SentenceEmbeddingsModel }; #[derive(Deserialize, Serialize, Debug)] struct EmbeddingRequest { model: Option