guillaume-be / rust-bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
https://docs.rs/crate/rust-bert
Apache License 2.0
2.51k stars 211 forks source link

When label mapping aren't provided - we get a crash #416

Open jondot opened 10 months ago

jondot commented 10 months ago

As opposed to transformers where labels are generated ad-hoc

[{'label': 'LABEL_0', 'score': 0.999602735042572}]

To resolve, we might want to add label mapping into SequenceClassificationConfig with some defaults, but it might be a change that's too radical

Another possible fix is to do the same thing as transformers and go:

let label_string = self.label_mapping.get(&id).unwrap_or_else(|| format!("LABEL_{id}")).to_owned();

instead of

let label_string = self.label_mapping.get(&id).unwrap().to_owned();

And then num_labels when no mapping specified, is... magic number 2 https://github.com/huggingface/transformers/blob/95b374952dc27d8511541d6f5a4e22c9ec11fb24/src/transformers/configuration_utils.py#L331

Well not so much magic if you assume a classifier with no other information provided is binary always which is what the python lib seems to do.

Any thoughts?

guillaume-be commented 9 months ago

Hello @jondot ,

The label mapping is loaded from the config.json file provided to initialize the model. Do you have an instance of a malformed model configuration that does not contain the label information? While creating labels "on the fly" if they are missing would allow the code to compile and run, the output is not properly form (what is LABEL_0 for the downstream application)?

I'd be in favor of keeping the current set-up to encourage user to provide a valid configuration, maybe additional documentation/hints for the error thrown would be helpful?