Open arctic-bunny opened 1 year ago
Hello @arctic-bunny ,
This, this should be feasible without too much difficulty. You would have to do the following (assuming the model was trained with Pytorch):
/utils/convert_model.py
script)
pub struct RobertaMultitaskClassificationHead {
dense: nn::Linear,
dropout: Dropout,
out_proj_1: nn::Linear,
out_proj_2: nn::Linear,
}
impl RobertaClassificationHead { pub fn new<'p, P>(p: P, config: &BertConfig, obj_1_dim: i64, obj_2_dim: i64) -> RobertaMultitaskClassificationHead where P: Borrow<nn::Path<'p>>, { let p = p.borrow(); let dense = nn::linear( p / "dense", config.hidden_size, config.hidden_size, Default::default(), ); let out_proj_1 = nn::linear( p / "out_proj_2", config.hidden_size, obj_1_dim, Default::default(), ); let out_proj_2 = nn::linear( p / "out_proj_2", config.hidden_size, obj_2_dim, Default::default(), ); let dropout = Dropout::new(config.hidden_dropout_prob);
RobertaClassificationHead {
dense,
dropout,
out_proj,
}
}
pub fn forward_t(&self, hidden_states: &Tensor, train: bool) -> (Tensor, Tensor){
let intermediate = hidden_states
.select(1, 0)
.apply_t(&self.dropout, train)
.apply(&self.dense)
.tanh()
.apply_t(&self.dropout, train)
(intermediate.apply(&self.out_proj_1), intermediate.apply(&self.out_proj_2))
}
The name of the layers (e.g. `out_proj_x`) will have to match the name of your modules in Pytorch, or be overwritten during conversion. You would in any case need to match the architecture of your Python model and translate it (the above is just a basic example).
4. Create the model using the encoder from the library and the mutlti-task head you defined (see https://github.com/guillaume-be/rust-bert/blob/a34cf9f8e4b31c2f3953c037b7df059a4246fc88/src/roberta/roberta_model.rs#L430 for example)
I hope this helps
Thank you for the write up. I wasn't too confident with implementing this but you've given me a great help. I'll take it from here and see if I can make it work.
Hey @guillaume-be, awesome job on this.
I'm trying to have one model for entity recognition and text classification. There are some implementations available in the python ecosystem for this and I'm just wondering if it's possible for this project. I want to avoid the situation where I'm loading two different models in my application when it's possible with one.
Something like this but for Rust. What would need to be done?