ohmeow / blurr

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.
https://ohmeow.github.io/blurr
Apache License 2.0
289 stars 34 forks source link

BlearnerForSequenceClassification for multiclass text classification using mBert #74

Open Farhad-Uz-Zaman opened 2 years ago

Farhad-Uz-Zaman commented 2 years ago

Hi, I am trying to use Multilingual BERT for multiclass text classification. As I am following a code, author used BlearnerForSequenceClassification for binary classification. But I am facing error for multiclass classification. The error is as follows:

"ValueError: Exception occured in Recorder when calling event after_validate: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted']."

BlearnerForSequenceClassification doesn't have parameters like num_labels. I am new in this filed. May be I am doing something wrong from the very beginning.

Any help would be highly appreciable. Sincerely, Farhad

ohmeow commented 2 years ago

"BlearnerForSequenceClassification doesn't have parameters like num_labels. I am new in this filed. May be I am doing something wrong from the very beginning."

You pass in num_labels as a parameter when building your Hugging Face objects ... I have a pretty thorough example that should get you up and running here: https://ohmeow.github.io/blurr/text-examples-multilabel.html

Farhad-Uz-Zaman commented 2 years ago

I've also tried MultiCategoryBlock instead of the CategoryBlock. But It is giving me the error- "TypeError: 'numpy.int64' object is not iterable." At the following line: dls = dblock.dataloaders(final_df, batch_size=16)

ohmeow commented 2 years ago

Can you share a gist with me ... its really hard to troubleshoot without seeing the code.

On Fri, Jun 3, 2022 at 3:43 AM Farhad @.***> wrote:

I've also tried MultiCategoryBlock instead of the CategoryBlock. But It is giving me the error- "TypeError: 'numpy.int64' object is not iterable." At the following line: dls = dblock.dataloaders(final_df, batch_size=16)

— Reply to this email directly, view it on GitHub https://github.com/ohmeow/blurr/issues/74#issuecomment-1145835286, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADNMARYHX64QXBTRJ2ALTVNHOVTANCNFSM5WSJP5WA . You are receiving this because you commented.Message ID: @.***>

anikchowdhury1 commented 2 years ago

"BlearnerForSequenceClassification doesn't have parameters like num_labels. I am new in this filed. May be I am doing something wrong from the very beginning."

You pass in num_labels as a parameter when building your Hugging Face objects ... I have a pretty thorough example that should get you up and running here: https://ohmeow.github.io/blurr/text-examples-multilabel.html @ohmeow the link that provided, is not working. I am also facing similar issues.

learn = BlearnerForSequenceClassification.from_data(
final_df, pretrained_model_name, text_attr="text", label_attr='label', dl_kwargs={"bs": 4}
)

In BlearnerForSequenceClassification.from_data, there is a parameter n_labels but after setting this, I am still getting errors.