oceanprotocol / pdr-backend

Instructions & code to run predictoors, traders, more.
Apache License 2.0
30 stars 22 forks source link

[EPIC: YAML] Add YAML & CLI. "v0.2" of Predictoor #400

Closed trentmc closed 9 months ago

trentmc commented 11 months ago

Background / motivation

Goal: a single YAML file + CLI that unifies settings across everything.

Consider this a "v0.2" of pdr-backend: the changes are big, and the UX changes a lot.

TODOs

First:

Then:

Then:

Then:

Then:

Berkay to do the final merge steps:

History of this issue

  1. We did a lot of groundwork in #278 "[EPIC] [Sim, bots] Easy sim --> pdr/trader bot flow w plots".
  2. Then The first part of this issue got built in #370 "YAML & CLI. It had PR #371, and was merged into main. The merge was premature, so we backed it out.
  3. The initial aim of this issue was to test and stabilize more. That's still an aim, but the scope has increased.
idiom-bytes commented 11 months ago

trueval bot failing is failing. I arrived here in a different route. I created a ticket (barge#393), and linked to it from pdr-backend#411.

idiom-bytes commented 10 months ago

[Model Factory + Model Data Factory move]

Topic 1 - Moving these files Hi, regarding these two, I believe we should move them out of data_eng.

As far as I understand it we're re-creating/training the model every step, so nothing from model_data_factory will actually get saved out either on disk, or in-cache.

It's not a data factory in the term of writing-to-lake or serving other etl functions, but a consumer of this data to serve the model requirements.

This seems to strengthen that this is more related to /sim/ than data-eng or etl/lake. I'm thinking /sim/ would be a more appropriate place.

Topic 2 - Rethinking ModelDataFactory Based on my comments above, I believe ModelDataFactory could perhaps be reduced to functions inside of ModelFactory that prepare the data for the model to train/build/etc... But not a "DataFactory" as per the definition of a factory that generates data, that is saved to the data lake, in parquet, or dealing with the core workflows.

It's simply consuming from that output, and providing a blob for ModelFactory

model_data_factory -> consumer from merged ohlcv df from lake -> nothing is saved on lake / isn't a data factory -> creates xY data locally, in-memory, for model/sim/predictoors -> different sim/models may have different xY data creation

model_factory -> nothing is consumed from lake -> nothing is saved on lake / isn't a data factory -> it's an abstraction for sim/predictoors

trentmc commented 10 months ago

[Model Factory + Model Data Factory move] Topic 1 - Moving these files

Agreed. Actually they briefly had their own directory before, but I moved them into data_eng because both dir's were small at the time. Now that data_eng is growing, it's a good time to give them their own directory again. Call it aimodel/. Expect this to grow a lot in coming months.

We can and should go further: rename data_eng/ to lake/, and also create a directory analytics/. Then make sure the appropriate modules are in each of {lake/, ai_model, and analytics}. I just created #446 to handle this. Full details there.

Topic 2 - Rethinking ModelDataFactory [AiModelDataFactory] But not a "DataFactory" as per the definition of a factory that generates data, that is saved to the data lake, in parquet, or dealing with the core workflows.

Overall, disagree. Here's why.

First: I agree that AiModelDataFactory doesn't put data into the lake. Nonetheless, AiModelDataFactory is still a data factory, since it creates data. There's nothing in the words "data factory" that implies that the output data must go into a data lake.

Second: AiModelDataFactory and AiModelFactory each have their own unique roles & responsibilities. They are cleanly separated concerns. And, each will grow in coming months: AiModelDataFactory for fancier feature engineering (eg ARIMA, information ticks), and AIModelFactory for fancier modeling (eg FFX, proper dynamical models).

Finally: the rename from data_eng/ to lake/, and the move of AiModel* to aimodel/ helps to clarify roles & responsibility of each.

Therefore, AiModelDataFactory code should not fold into AiModelFactory.

idiom-bytes commented 10 months ago

Hi @trentmc if you get a chance to fix this...

************* Module pdr_backend.util.test_ganache.__init__
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract -> pdr_backend.models.token) (cyclic-import)
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract) (cyclic-import)
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract -> pdr_backend.models.fixed_rate) (cyclic-import)
trentmc commented 10 months ago

Hi @trentmc if you get a chance to fix this ... Cyclic import ...

See https://github.com/oceanprotocol/pdr-backend/issues/455

trentmc commented 9 months ago

Done!