psnonis / FinBERT

BERT for Finance : UC Berkeley MIDS w266 Final Project

198 stars 62 forks source link

Release of FinBERT pre-trained Model (Prime / Combo) #2

Open franz101 opened 5 years ago

franz101 commented 5 years ago

Hi,

great project. I just read your paper and found it interesting to compare your results to the paper of https://arxiv.org/pdf/1908.10063.pdf

Do you plan on releasing the pre-trained models for research? Thank you very much for making the training data available.

All the best!

yuanbit commented 4 years ago

I am doing research in this area as well, is it possible to share the pre-trained models?

Thank you!

vr25 commented 4 years ago

This issue might be helpful.

nehatj commented 4 years ago

@vr25 i looked at your finbert & finRoberta model release on huggingface, I am trying to set up a sentiment analysis pipeline using your model & tokenizer but I am running into an OSError: Can’t load ‘tokenizer’ issue, am I missing something? Can I use this pipeline with your model?

Below is how I am using it: tokenizer = AutoTokenizer.from_pretrained(“vr25/fin_Roberta-v1) model = AutoModelWithLMHead.from_pretrained(“vr25/fin_Roberta-v1) pipeline(‘sentiment-analysis’, model=model, tokenizer=tokenizer)

vr25 commented 4 years ago

Yes, it should work.

franz101 commented 4 years ago

@nehatj check your capitalisation of the letters:

from transformers import AutoTokenizer,AutoModelWithLMHead,pipeline tokenizer = AutoTokenizer.from_pretrained("vr25/fin_RoBERTa-v1") model = AutoModelWithLMHead.from_pretrained("vr25/fin_RoBERTa-v1")

sentiment = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer) sentiment("the stock needs to sold")

leads to a key error.

Also the tokenizer give an interesting output with a special character in front of most starting letters: tokenizer.tokenize("the stock needs to sold")

['the', 'Ġstock', 'Ġneeds', 'Ġto', 'Ġsold']

vr25 commented 4 years ago

@nehatj, @franz101 Thank you for pointing this out. I'll fix this in the next version. All the updates will be pushed here.

Thanks!