yya518 / FinBERT

A Pretrained BERT Model for Financial Communications. https://arxiv.org/abs/2006.08097
Apache License 2.0
560 stars 128 forks source link

MisMatch between labels during dataPrep v.s. during inference #17

Closed iattar closed 3 years ago

iattar commented 3 years ago

I might be missing something, but it seems there is a mis-match between the label ordering when preparing the training data v.s. when doing the inference in the provided jupyter notebook.

In the 'datasets.py' file, in the 'transform_labels' method, the order is: "dict_labels = {'positive': 0, 'neutral':1, 'negative':2}" But in the notebook: "labels = {0:'neutral', 1:'positive', 2:'negative'}"

In both cases, 'negative'=2, but 'positive' and 'neutral' are switched. Is there a reason I am missing? Thank you!

yya518 commented 3 years ago

Thanks for pointing it out. The transform_labels method in the dataset.py is used to prepare a different dataset (financialPhraseBankDataset), and it's not used by the inference Notebook code. If you use our notebook, I assume you want to use the model fine-tuned by the analyst-tone data, where the labels are {0:'neutral', 1:'positive', 2:'negative'}