Open trentmc opened 8 months ago
Predictoor Q’s: What was the total available rewards for the last x hours/x days? How many right/wrong predictions I had in the last x hours/x days? How many tokens I won/lost in the last x hours/x days predictions? How much I’m making compared to other predictoors? How much revenue comes in via sails?
Trader Q’s: How much did I spend on subscriptions in the last week? Which assets have the most sales?
Predictoor Q's:
What I envision for rendering this data:
Slides:
Update: I've done some first cut pencil-and-paper prototypes in the 2023 10 FE Prototypes GSlides. I'm sharing early to communicate how I'm thinking about this. It's a CAD tool for predictoors and traders! :)
I don't plan to spend more time at this right now. I will only when this overall issue becomes a priority, which might be soon and might not be soon.
Hi, as we need to implement accuracy calculations for the FE (2000 samples), I recommended @kdetry to start thinking of how to build this:
High level flow of how this might get used:
When I presented the prototypes this past Thursday, I described how we can evolve from something super-simple to a high-quality webapp.
Here I flesh it out, as practical as possible, pointing to code that exists and that can be evolved.
The spectrum, from simplest first:
We have (1). We can do a "tracer bullet" starting with (2) and going all the way through (6). Then we can continually flesh out plots at the level of (1-2: simulation flow), and as they mature we pull them into (4-6: analytics service).
Update: I converted (2-5) to github issues, linked above. And all this work is part of a new issue: "[EPIC] [Simulation, bots] Easy-to-use & powerful simulation --> predictoor/trader bot flow" pdr-backend#278
@idiom-bytes wrt your comment of what @kdetry can be doing: it's really describing an architecture for step (6) in my comment above -- "remote analytics service"
Rather than directly jumping to (6), it might be wise to go through steps (2) - (5) first, tracer bullet style. This will ensure that we have a pipeline from rough prototypes (steps 1,2) all the way through to production remote analytics service (step 6).
Thoughts?
I generally agree that the analytics system could be responsible for helping to execute the data/graphic work across all 6-features. However, since some of these are already working, and we want to onboard others, it might be easier to onboard with small tasks from (6) and then bring other existing workflows (1) over to this module.
Example: (6) right now is really small. It just needs 2k sample accuracy for 2x timeframes, so:
There is a script named data_factory.py which does some nice work to maintain a checkpoint of how much it has downloaded so far. I imagine that data_factory and some of the work that trent has done so far, would benefit from being abstracted and moved into something more general like /utils/
so other systems can use it.
Inside pdr-backend, you should be able to just import/instantiate/conigure a data_factory, and start using it.
The server:
The bot:
I also propose data_factory gets an update to use polars + parquet to do this. It's incredibly fast, and will enable us to grow
Thanks for the thoughts @idiom-bytes .
OK to a small (6) now, for the 2K thing. (Via the 2K github issue.)
Please don't use approach3/data_factory.py for that. It has completely student goals.
(Fyi I have a github issue to move the simulator stuff from approach3 directory to a more general place. That too is outside the scope of 2K work. And I want to do it when I get back because I know exactly what I want, and how to do it. So please don't do it in the meantime. Focus on 2K.)
For (6) it's not clear for me, when talking about serving the accuracy analytics data via server do you mean using pdr-backend or a different service? If it's a different service I propose that we use pdr-websocket.
For (6) it's a different service. Not pdr-backend.
I don't rule out pdr-websocket. I defer to you (Norbert) and Mustafa and Roberto.
[WRT pdr-websockets] This is just for having a pk that can talk to the contract + w/o exposing to the client. Which has been leading to all sorts of maintenance issues.
[WRT Websockets Forwardlooking] pdr-web + pdr-websockets should be nearly-frozen for now. pdr-ws has been nightmarish to support, a lot of code is getting duplicated/fragmented against pdr-web. Rather than building a pdr-fe-util lib to start addressing some of this problem... I think there is a solution to tech spike that would reduce this complexity by an order of magnitude.
How?
(1) and (2) are deployed in separate environments but share the exact same stack. PK is not shown to client. We leverage more of next.js native functionality.
*** I have created Ticket oceanprotocol/pdr-web#283 in pdr-web to represent this
[WRT dApp/Predictoor Analytics (6)] Based on trents feedback... (A) I think leaderboards, epoch summaries, ecosystem metrics, and all sorts of things, should be written in python, in a clean module that is self-contained, atomic, and easy to import. (B) Rather than querying for GQL each time. This system should dump all data from subgraph, and build summaries for everything. This will look like an etl workflow. Only fetch what's needed, and update the data. Think parquet + dataframes. (C) As a pdr-trader, I'll want to query this system in addition to trained models that have obtained this data, as a way to understand other user behaviors, competitiveness across feeds, which ones are buying, and have high-level trading agents decide which feeds to use, or which predictoor feeds to submit to. (D) If desired, in the future, this service could sit in front of a GQL provider (E) As an app developer, I can easily query this data through remote/fetch/GET. (F) As a builder in pdr-backend, I can import this module, run the etl locally, and query the local cache directly from the app during my epoch updates. Example: Copy trading from known predictoors that are incredibly accurate. (G) As an ML engineer in pdr-backend, I can import the module, run the etl locally, and query the local cache directly to build my dataframes and features w/ behaviors from Predictoors. (H) If desired, this module could be easily extended w/ a FE to take all metrics/graphs/etc, and serve it to streamlit/etc...
*** I have created Ticket oceanprotocol/pdr-web#284 in pdr-web to represent this
[Final Remarks] pdr-websocket was primarily used to not expose a pk to the client. pdr-web will get bloated and this code will never be re-used if it ends up in there. This doesn't belong in pdr-websocket or pdr-web. Do not write it in JS either.
All of our data science and knowledge is being written in py. I want to be reading directly from the py stack. Please view this a data problem, not an app problem.
Hey @trentmc, I was double checking on 'The spectrum, from simplest first' described above. Looks like you are assigned to the first step can I start working on the second step? There is already a fist cut of first step available that I could use to move things forward.
Hey @trentmc, I was double checking on 'The spectrum, from simplest first' described above. Looks like you are assigned to the first step can I start working on the second step? There is already a fist cut of first step available that I could use to move things forward.
TBH I'd prefer to handle this myself, and the follow-up steps. I've finally got "Ship Predictoor DF" off my plate, and I intend to go through all these steps ASAP, and quickly. (Written as an EPIC in pdr-backend#278.)
FYI the "FE: backlog" column in DF/VE board has many items that could be covered.
Ok, sure, sounds good. I was kind of expecting that you are going to go trough this that's why I wanted to check. Looks like Predictoor stats are a high priority now, since Predictoors are now able to make money showing this to community should help with incentivising people to onboard.
For example a Predictoor leaderboard on the UI displaying the top x predictoors with their returns and accuracy I checked your prototype and is mainly focused on 'how much I make'. We might also want to have a section about: 'how much others make' so users ca see that they can make money before they start onboarding. Oh, NVM, I see there is a page for Predictoors, where you can see information about other Predictoors.
Hey Norbert, we get all of this out-of-the-box if we have the data and streamlit setup in a certain way. Let's continue to write down questions + design dashboards, and then figure out the data pipeline + tables we need to serve all of this.
example w/ a bronze->silver->gold pipeline:
pdr_backend/data/gold/user_summary.parquet
On the streamlit side we can add a couple of dropdown and a text-field to serve the result:
Architecture: from this Slack msg
Below is a design for analytics architecture, and its relation to pdr-backend. From a discussion among Berkay, Roberto, myself.
Have three separate repos:
Usage:
Not near term: Only once the above is stabilized and the analytics fleshed out nicely (2+ mos from now), we can...
Near-term order-of-dev-work:
One more thing: keep Mustafa's new service for the accuracy estimation in pdr-backend for now. (Avoid rocking the boat here for now. Revisit when we make pdr-analytics live on predictoor.ai)
Background / motivation
Our core users are predictoors & traders who use our pdr-backend python bots. We want to reduce friction for them.
Even though they operate largely in python-land, there are things we can do in the webapp to help them out.
Top (or near top) of the list is to help them answer the Q: "How much $ am I making / losing". There are many drill-down Q's that emerge from that.
This issue covers both traders & predictoors, because there will be overlap.
TODOs
Key reference: pdr-analytics prototypes (Gslides)
Related