guillaume-be / rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
https://docs.rs/crate/rust-bert
Apache License 2.0
2.67k stars 216 forks source link

Introduce in-memory resource abstraction #375

Closed mweber15 closed 1 year ago

mweber15 commented 1 year ago

This follows from discussion in #366.

The goal of this change is to allow for weights to be loaded from a copy of rust_model.ot that is already present in memory. There are two ways in which that data might be present:

  1. As a HashMap<String, Tensor> from previous interaction with tch
  2. As a contiguous buffer of the file data

One or the other mechanism might be preferable depending on how user code is using the model data. In some sense, implementing a provider based on the second option is more of a convenience method for the user to avoid the tch::nn::VarStore::load_from_stream interaction.

I've changed the definition of the ResourceProvider trait to require that it be both Send and Sync. There are currently certain contexts where dyn ResourceProvider + Send is required, but in theory before this change an implementation might not be Send (or Sync). The existing providers are both Send and Sync, and it seems reasonable (if technically incorrect) for user code to assume this to be true. I don't see a downside to making this explicit, but that part of this change might be better suited for separate discussion. I am not trying to sneak it in.

The enum Resource data type is used here as a means to abstract over the possible ways a ResourceProvider might represent an underlying resource. Without this, it would be necessary to either call different trait methods until one succeeded or implement as_any and downcast in order to implement load_weights similarly to how it is now. Those options seemed less preferable to creating a wrapper.

While it would be possible to replace all calls to get_local_path with the get_resource API, removal of the existing function would be a very big breaking change. As such, this change also introduces RustBertError::UnsupportedError to allow for the different methods to coexist. An alternative would be for the new ResourceProviders to write their resources to a temporary disk location and return an appropriate path, but that is counter to the purpose of the new ResourceProviders and so I chose not to implement that.

mweber15 commented 1 year ago

After thinking more about the potential for tch::Tensors to be mutated despite the attempt here to make that difficult, I don't really think TensorResource is a safe abstraction. I know that for my usage, BufferResource would be sufficient, and that can be implemented without safety concerns. Very interested to hear other thoughts.

Dropping the requirement I've added here for ResourceProvider to be Send + Sync would of course be fine and would put the onus back on users of TensorResource to observe its limitations. I do think it would be beneficial for all ResourceProviders to be explicitly Send + Sync if possible though.

guillaume-be commented 1 year ago

Thank you @mweber15 . Could you please add a small test that would validate and illustrate the approach is working for the buffer resource? You could maybe pick any of the small-sized model (e.g. DistilBART as done so far or a DistilBERT model)

mweber15 commented 1 year ago

Hi @guillaume-be,

The recent build change has broken the build for me in a couple of ways that I'm working through. Once I have that sorted I will add an illustrative test and make sure the doc comments reflect the current state of the PR. Hopefully soon.

mweber15 commented 1 year ago

Thank you so much for your help putting this together!