instadeepai / tunbert

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)
MIT License
107 stars 37 forks source link

Poor documentation #6

Open raslenmtg opened 2 years ago

raslenmtg commented 2 years ago

how test it after installation ?

not-lain commented 9 months ago

leaving this reply to future lurkers : finished registering the model in huggingface and you can load the TunBERT model using the following code

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("not-lain/TunBERT")
model = AutoModelForSequenceClassification.from_pretrained("not-lain/TunBERT",trust_remote_code=True)

not that you need to install the transformers library beforehand`

how to use the model :

text = "[insert text here]"
inputs = tokenizer(text,return_tensors='pt') # make sure you are using the `return_tensors='pt'` parameter
output = model(**inputs)
print(output.logits)

sadly the original implementation they are using a head with 2 neurons and the bias is on, so the model is showing 4 categories instead of 2, (or maybe i misunderstood how they labeled the data) in all cases, i would love to get an update around this . picture for reference image