rooteco / tweetscape

the supercharged twitter feed
https://prototype.tweetscape.co
GNU Affero General Public License v3.0
18 stars 2 forks source link

Update Hive cluster scores on-demand #409

Closed nicholaschiang closed 2 years ago

nicholaschiang commented 2 years ago

Part of #397 that wasn't completed.

nicholaschiang commented 2 years ago

I'm not sure if this is absolutely necessary as it adds a pretty big hit to performance as we'll have to query an external API 20 times (Hive returns influencer scores in pages of 50 and we want to query the first 1000 influencers for each cluster) v.s. just fetching that data in <20ms from our Postgres database.

Instead, I'm thinking we'll just periodically update our Postgres database with the newest scores from Hive every week or so. Hive doesn't recalculate scores all the time either: from what I've seen, it looks as if they only update scores whenever they update their algorithm which only happens once or twice a month at most. There's no point in adding overhead when the actual data doesn't change.

Even more ideally, Hive would add a webhook feature where we can be notified if they've re-indexed a cluster's scores and then update our database accordingly.