Closed trentmc closed 9 months ago
trueval bot failing is failing. I arrived here in a different route. I created a ticket (barge#393), and linked to it from pdr-backend#411.
[Model Factory + Model Data Factory move]
Topic 1 - Moving these files Hi, regarding these two, I believe we should move them out of data_eng.
As far as I understand it we're re-creating/training the model every step, so nothing from model_data_factory will actually get saved out either on disk, or in-cache.
It's not a data factory in the term of writing-to-lake or serving other etl functions, but a consumer of this data to serve the model requirements.
This seems to strengthen that this is more related to /sim/
than data-eng
or etl/lake.
I'm thinking /sim/
would be a more appropriate place.
Topic 2 - Rethinking ModelDataFactory Based on my comments above, I believe ModelDataFactory could perhaps be reduced to functions inside of ModelFactory that prepare the data for the model to train/build/etc... But not a "DataFactory" as per the definition of a factory that generates data, that is saved to the data lake, in parquet, or dealing with the core workflows.
It's simply consuming from that output, and providing a blob for ModelFactory
model_data_factory
-> consumer from merged ohlcv df from lake
-> nothing is saved on lake / isn't a data factory
-> creates xY data locally, in-memory, for model/sim/predictoors
-> different sim/models may have different xY data creation
model_factory
-> nothing is consumed from lake
-> nothing is saved on lake / isn't a data factory
-> it's an abstraction for sim/predictoors
[Model Factory + Model Data Factory move] Topic 1 - Moving these files
Agreed. Actually they briefly had their own directory before, but I moved them into data_eng because both dir's were small at the time. Now that data_eng is growing, it's a good time to give them their own directory again. Call it aimodel/.
Expect this to grow a lot in coming months.
We can and should go further: rename data_eng/
to lake/
, and also create a directory analytics/
. Then make sure the appropriate modules are in each of {lake/
, ai_model
, and analytics
}. I just created #446 to handle this. Full details there.
Topic 2 - Rethinking ModelDataFactory [AiModelDataFactory] But not a "DataFactory" as per the definition of a factory that generates data, that is saved to the data lake, in parquet, or dealing with the core workflows.
Overall, disagree. Here's why.
First: I agree that AiModelDataFactory doesn't put data into the lake. Nonetheless, AiModelDataFactory is still a data factory, since it creates data. There's nothing in the words "data factory" that implies that the output data must go into a data lake.
Second: AiModelDataFactory and AiModelFactory each have their own unique roles & responsibilities. They are cleanly separated concerns. And, each will grow in coming months: AiModelDataFactory for fancier feature engineering (eg ARIMA, information ticks), and AIModelFactory for fancier modeling (eg FFX, proper dynamical models).
Finally: the rename from data_eng/
to lake/
, and the move of AiModel*
to aimodel/
helps to clarify roles & responsibility of each.
Therefore, AiModelDataFactory code should not fold into AiModelFactory.
Hi @trentmc if you get a chance to fix this...
************* Module pdr_backend.util.test_ganache.__init__
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract -> pdr_backend.models.token) (cyclic-import)
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract) (cyclic-import)
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract -> pdr_backend.models.fixed_rate) (cyclic-import)
Hi @trentmc if you get a chance to fix this ... Cyclic import ...
Done!
Background / motivation
Goal: a single YAML file + CLI that unifies settings across everything.
Consider this a "v0.2" of pdr-backend: the changes are big, and the UX changes a lot.
TODOs
First:
yaml-cli2
predictoor
branch merged into `main. barge#391Then:
main
branch, switch from BTC/TUSD -> BTC/USDT in in pdr_backend/publisher/main.py. Commit https://github.com/oceanprotocol/pdr-backend/commit/8c0e023e9959e3dae89946f7c1090f46f934406dv0.1.2
yaml-cli2
branch, switch from BTC/TUSD -> BTC/USDT in in pdr_backend/publisher/main.py. Commit https://github.com/oceanprotocol/pdr-backend/commit/21a65d47f324dc0f230e7d527f3b7f2890f83c1byaml-cli2
branch: test that issues 417 and 418 don't recur there. If they do, fix as needed. Testing run.Then:
yaml-cli2
branch, write proper unit tests for cli_module.pyentrypoint.sh
needs to callpython /app/pdr $@
Then:
yaml-cli2
, with tagyaml-cli2
predictoor2
branch.predictoor2
, for eachpdr-*.yml
file: useyaml-cli2
docker image tag, update CLI call to usepdr <cmd> <arg1> <arg2> ..
yaml-cli2
, in pytest CI, use bargepredictoor2
branch. Now, pdr-backend CI should passpredictoor2
branch in pytest configuration. Now, Trent's local pytest runs (using VPS) should pass.predictoor2
branch in pdr bot configuration. Then, check that the pdr bot flow in vps.md works.Then:
Berkay to do the final merge steps:
predictoor2
, for eachpdr-*.yml
file: uselatest
docker image tag (not yaml-cli2)yaml-cli2
, in pytest CI, remove line for bargepredictoor2
branchyaml-cli2
into main. This will auto-do a new docker image of pdr-backendlatest
predictoor2
into mainrelease-process.md
main
branch? If no, there will be havoc with trueval bot and more. A: yes :)yaml-cli2
branchmain
, go through all READMEs and ensure that they work.History of this issue