guillaume-be / rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
https://docs.rs/crate/rust-bert
Apache License 2.0
2.62k stars 215 forks source link

[Feature Request] Joint Models #308

Open arctic-bunny opened 1 year ago

arctic-bunny commented 1 year ago

Hey @guillaume-be, awesome job on this.

I'm trying to have one model for entity recognition and text classification. There are some implementations available in the python ecosystem for this and I'm just wondering if it's possible for this project. I want to avoid the situation where I'm loading two different models in my application when it's possible with one.

Something like this but for Rust. What would need to be done?

guillaume-be commented 1 year ago

Hello @arctic-bunny ,

This, this should be feasible without too much difficulty. You would have to do the following (assuming the model was trained with Pytorch):

  1. Convert the set of Pytorch weights to the CTensor format (using the /utils/convert_model.py script)
  2. Identify the model backbone that was used in Python (e.g. BERT or RoBERTa encoder)
  3. Create a custom classification head for your model, which may look something like:
    
    pub struct RobertaMultitaskClassificationHead {
    dense: nn::Linear,
    dropout: Dropout,
    out_proj_1: nn::Linear,
    out_proj_2: nn::Linear,
    }

impl RobertaClassificationHead { pub fn new<'p, P>(p: P, config: &BertConfig, obj_1_dim: i64, obj_2_dim: i64) -> RobertaMultitaskClassificationHead where P: Borrow<nn::Path<'p>>, { let p = p.borrow(); let dense = nn::linear( p / "dense", config.hidden_size, config.hidden_size, Default::default(), ); let out_proj_1 = nn::linear( p / "out_proj_2", config.hidden_size, obj_1_dim, Default::default(), ); let out_proj_2 = nn::linear( p / "out_proj_2", config.hidden_size, obj_2_dim, Default::default(), ); let dropout = Dropout::new(config.hidden_dropout_prob);

    RobertaClassificationHead {
        dense,
        dropout,
        out_proj,
    }
}

pub fn forward_t(&self, hidden_states: &Tensor, train: bool) -> (Tensor, Tensor){
    let intermediate = hidden_states
        .select(1, 0)
        .apply_t(&self.dropout, train)
        .apply(&self.dense)
        .tanh()
        .apply_t(&self.dropout, train)

       (intermediate.apply(&self.out_proj_1), intermediate.apply(&self.out_proj_2))
}

The name of the layers (e.g. `out_proj_x`) will have to match the name of your modules in Pytorch, or be overwritten during conversion. You would in any case need to match the architecture of your Python model and translate it (the above is just a basic example).
4. Create the model using the encoder from the library and the mutlti-task head you defined (see https://github.com/guillaume-be/rust-bert/blob/a34cf9f8e4b31c2f3953c037b7df059a4246fc88/src/roberta/roberta_model.rs#L430 for example)

I hope this helps
arctic-bunny commented 1 year ago

Thank you for the write up. I wasn't too confident with implementing this but you've given me a great help. I'll take it from here and see if I can make it work.