maxbbraun / trump2cash

A stock trading bot powered by Trump tweets
https://trump2cash.biz
MIT License
6.27k stars 858 forks source link

Checking the sentiment too often? #67

Closed franz101 closed 3 years ago

franz101 commented 3 years ago

Hey Max,

I love love love your projects. I loved reading the code. Experimenting with it today. I noticed something that I think is a mistake:

sentiment = self.get_sentiment

this happens in the loop where you go over each entity. Then from the entities you loop over each wikipedia company. So this get's called sometimes 2-4 times. Even though only 1 time is necessary. Am I correct here? ...

I have a few things I was wondering. First I used historic Tweets for experimenting. Here I noticed something. When using bulk data. The entity recognition get's very expensive. 1 $ for 1000 queries after 5000 queries. I therefore merged all tweets to huge text block and then analysed it. But it was hard to merge the entities with the tweets again, as the text position is not saved... While doing this experiment I noticed something else. Often times $TSLA won't get recognised. So maybe this get's lost during the entity recognition or the wiki lookup.

Also for historical analysis I added a cache for wikipedia lookups which is good for them and the speed improvement :) I also retrained Bert on fintwits, which could also be a cool add-on to Google sentiment.

Well anyway, I'm not to sure how committed you are to this hobby project. Keep on inspiring!

Have a great weekend, Franz

You will find me on Twitter @franz101 ❤️

maxbbraun commented 3 years ago

Hi Franz!

You're correct in pointing out that this line isn't very efficient. It was an artifact from initially trying to determine sentiment per company, since they could be different. (See also #3.) I just added a fix in 0d13c254bebed693eefea0ae026a16e1b215fcb5.

On the backtesting issue, the benchmark certainly could be significantly improved in many ways. It's also probably something that really shouldn't be written from scratch anyway and use some existing backtesting framework instead.

Thanks! Max

franz101 commented 3 years ago

@maxbbraun that'S great, for this you can use the API: https://language.googleapis.com/v1/documents:analyzeEntitySentiment

I build a prototype in Google Sheets with Google App Script 😀 😀

Screenshot 2021-03-04 at 10 26 19

maxbbraun commented 3 years ago

I've tried that API too, but the results aren't good. There's probably just not enough words in an average tweet to do reliable sentiment detection at a granular level.