Closed grantdfoster closed 1 month ago
Quick hotfix for queries has been merged here https://github.com/masa-finance/masa-bittensor/pull/286
What about scraping something like https://trends24.in/, get the trends with major volume about crypto and and them to the query list?
I found a little hard to find a crypto specific trends list, seems like we would need to fetch an entire trending topics list and filter by "is crypto related"
I don't feel like doing a list check on each TT will be a good solution a lot could be missed, what about a little LLM that detects crypto topics on a list of TTs? sounds too complex
Imagine we create an admin app through which we generate / update the list of queries, publish it to a public spot, same format as the current file. Validators pull this instead of the file in the repo to use for synthetic tweets.
If this isn't an automated process, lets say we go and manually update the list, I would not build something new, a simple PR can do the same with the current list and we don't spend time on new tooling
I will think more about the "trending topics" today...
As it stands, our current list of queries typically have a lot of volume (2,000+ tweets a day, 100 within the first hour of the new day). While pulling from a trending list is feasible, it's is also the "highest hanging" fruit, and there are other, easier ways to increase volume + miner demand. From easiest to hardest (and highest priority to lowest):
crypto pump
, crypto dump
... to name a fewcount
of tweets asked for in volume checking (currently 100
). We've been using this spreadsheet to calculate how much we are "stressing" the protocol node. I would aim for at least 5x capacity, meaning each node has to successfully run 5 credentials to keep up with validator demand.10
). This number works in conjunction with the tweet count
.1/100th
of a tempo which is an easy number to work with.Being discussed in https://github.com/masa-finance/masa-bittensor/issues/295
Moving this to in review as MIP #2 (#295 ) captures this discussion!
Description
We currently define a configuration in
config/twitter.json
that defines a list of queries the validators as miners to mine. The list of queries is somewhat "random", and we can improve said list. Currently it focuses on crypto and web3 related topics, but this can be expanded on.Some queries, like
crypto analysis
, often don't have many tweets associated w/ them, and it is unfair to miners who are asked to mine said query, as they won't return as many tweets as other, more broad queries. Furthermore, we need to better define what the downstream use case for the twitter data is - currently it just sits on validator hardware.Ideas
It was proposed that perhaps the list of queries is dynamic, pulled from Twitter itself, perhaps the trending topics section, etc. This would both ensure volume AND usefulness, as we are harvesting data with the most volume / relevance.
Imagine we create an admin app through which we generate / update the list of queries, publish it to a public spot, same format as the current file. Validators pull this instead of the file in the repo to use for synthetic tweets.
Then also, using the protocol API we already created, validators post the raw synthetic tweet responses. We dedupe the tweets themselves and store them, and we keep stats on the miners and validators that returned them (identifying miners and validators by their hotkey / coldkeys)