oceanprotocol / pdr-web

https://predictoor-web.vercel.app
1 stars 2 forks source link

[pdr-analytics] Lite data module for serving pdr metrics #284

Closed idiom-bytes closed 1 year ago

idiom-bytes commented 1 year ago

Problem

We need a lite service that integrate across the stack in a variety of ways. As defined in pdr-web oceanprotocol/pdr-analytics#1. This is meant to be a first cut at issue #(6) as outlined by trent here and further summarized here

DoD:

trentmc commented 1 year ago

I like the sentiment here.

But the scope is too large. "For all metrics".

The reason that I gave five steps before this, is that each of those is small and evolutionary, in line with what we are organically doing. They let us explore metrics quickly, locally, without rest APIs and other heavier architectures to slow us down.

Whereas this issue reads like "build a big pipeline". Before we know any metrics.

So, please do NOT do this in its current description.

My recommendation:

Build a small thing that ONLY handles the "calculate accuracy over 2000 samples". That's all. No more.

Then we can see what it looks like, and get used to it. Meanwhile we can keep iterating on earlier steps 1/2/.. in a fast organic way.

trentmc commented 1 year ago

I agree that we stop adding more to pdr-ws and instead focus on simpler python-centric data flows.

idiom-bytes commented 1 year ago

@trentmc I agree

Let's talk about last_2000 accuracy and organic. Please refer to this PR as for "this existing inside pdr-backend" and "small" https://github.com/oceanprotocol/pdr-backend/pull/290/files === Item A (1) This script could just be extended to provide a more flexible interface. Or a similar one that, that may share common, structure, and output. === Item B (2) Another module inside pdr-backend/pdr_backend/analytics could exist that serves this through a lite flask app (3) Or. another repo could exists that imports pdr-backend and serves this through a lite flask app === Item C (4) I don't see the benefits for (3) being a different repo. Analytics (as described in ticket oceanprotocol/pdr-analytics#1) is a backend item. Further, pdr-backend is a monorepo of backend services. (5) This might help backend coordinate better, as @trizin just finished doing a bunch of python work that would help @kdetry

trentmc commented 1 year ago

Let's talk about last_2000 accuracy and organic. Please refer to this PR as for "this existing inside pdr-backend" and "small" https://github.com/oceanprotocol/pdr-backend/pull/290/files === Item A (1) This script could just be extended to provide a more flexible interface. Or a similar one that, that may share common, structure, and output.

Agreed. It needs refactoring though. I just created an issue for that: pdr-backend#296 "get_predictoor_info.py needs to be more modular, and tested".

=== Item B (2) Another module inside pdr-backend/pdr_backend/analytics could exist that serves this through a lite flask app (3) Or. another repo could exists that imports pdr-backend and serves this through a lite flask app === Item C (4) I don't see the benefits for (3) being a different repo. Analytics (as described in ticket https://github.com/oceanprotocol/pdr-analytics/issues/1) is a backend item. Further, pdr-backend is a monorepo of backend services. (5) This might help backend coordinate better, as @trizin just finished doing a bunch of python work that would help @kdetry

Let's keep it all in pdr-backend repo. Simpler, easier to manage, easier to test.

trentmc commented 1 year ago

Agreed. It needs refactoring though. I just created an issue for that: pdr-backend#296 "get_predictoor_info.py needs to be more modular, and tested".

Therefore you have a choice:

I recommend that latter. Fewer dependencies, and it'd be good to have this 2000-samples thing in. (It's embarrassing to see accuracies <50%)

kdetry commented 1 year ago

Hello @trentmc , it is almost done, I will share the code asap. Thank you for the description

kdetry commented 1 year ago

We may need different things in future, eg. currently the get_predictor_info.py script queries based on prediction senders, but I changed the query and the flask app works with contract addresses. I copied the script and worked on it, instead of importing it, I hope it won't be an issue

trentmc commented 1 year ago

I copied the script and worked on it, instead of importing it, I hope it won't be an issue

Yes, it's an issue. You should never repeat code, it greatly adds complexity. "DRY = Don't Repeat Yourself" is the # 1 rule of managing complexity.

Given your proposed direction: I recommend doing #296 first, to refactor the script into reusable tested components. Otherwise we end up with unmanageable code.

kdetry commented 1 year ago

It is merged and deployed, so we can close the issue