[EPIC: YAML] Add YAML & CLI. "v0.2" of Predictoor

trentmc commented 11 months ago

Background / motivation

Goal: a single YAML file + CLI that unifies settings across everything.

Consider this a "v0.2" of pdr-backend: the changes are big, and the UX changes a lot.

TODOs

First:

[x] (Leading to this) Revert two big changes
[x] Write unit tests for ex-scripts #393. check_network.py, get_predictoor_info.py, topup.py, constants_opf_addresses.py
[x] Fix "Users can't claim rewards" #401
[x] Create a branch: yaml-cli2
[x] In branch, merge from main to get the fix from #401. --> no change was needed, we just needed to wait
[x] In branch, merge in both commits into the branch. (https://github.com/oceanprotocol/pdr-backend/commit/b46cd8166a9b0d4bc6eab625182a71ee410e549c and https://github.com/oceanprotocol/pdr-backend/commit/40a478b0e2c01a24d58211f57df503ff5e1dbb33)
[x] Do entrypoint.sh for pdr CLI #405
[x] Get in csv/pandas refactor
- [x] Review PR#396 for csvs/pandas refactor (Roberto's), before merging it
- [x] In branch, if prev step's review is ok, merge in PR#396. Do it here because the next steps are thorough testing
[x] Fix #408: test_sim_engine failing in yaml-cli2, bc hist_df is s not ms. It should be ms.
[x] [Trent] #416 No Feeds Found - data_pp.py changes pair standards
[x] The above step should fix: pdr-trueval does not work on VPS, barge runner does not provide the right args #419. If not fixed yet, fix it.
[x] [Alex, Trent] (Pre-requisite: In barge repo, get predictoor branch merged into `main. barge#391

Then:

[x] [Trent] Add "github release" to existing release process pdr-backend#382
[x] [Trent] Add "docker release" to existing release process pdr-backend#423
[x] [Trent] Fix BTC/TUSD pairs in barge
- [x] In pdr-backend main branch, switch from BTC/TUSD -> BTC/USDT in in pdr_backend/publisher/main.py. Commit https://github.com/oceanprotocol/pdr-backend/commit/8c0e023e9959e3dae89946f7c1090f46f934406d
- [x] Publish a new docker image for pdr-backend. How: go through the pdr-backend release process; it will auto-publish a new image:). It's release v0.1.2
- [x] In pdr-backend yaml-cli2 branch, switch from BTC/TUSD -> BTC/USDT in in pdr_backend/publisher/main.py. Commit https://github.com/oceanprotocol/pdr-backend/commit/21a65d47f324dc0f230e7d527f3b7f2890f83c1b
- [x] Then we've fixed #417 and #418! (In main branch.) So close them.
- [x] For pdr-backend yaml-cli2 branch: test that issues 417 and 418 don't recur there. If they do, fix as needed. Testing run.

Then:

[x] [Trent] test_data_ss_now is failing #427
[x] [Trent] test_get_hist_df - FileNotFoundError #428
[x] In pdr-backend yaml-cli2 branch, write proper unit tests for cli_module.py
[x] [Trent] Add "github release" to existing release process #382
[x] Fix bug: pdr-backend entrypoint.sh needs to call python /app/pdr $@
[x] [Trent] Add YAML-CLI migration guidelines #410
- [x] Inspect code: for each envvar in old envvars.md list, ensure that they aren't showing up in unwanted places in code
[x] [Roberto] Test everything more thoroughly, manually. #411

Then:

[x] [Trent] Get barge & YAML-CLI working on pytest & predictoor. (Incl barge#393)
- [x] In dockerhub, add docker image from branch yaml-cli2, with tag yaml-cli2
- [x] In barge repo, create new predictoor2 branch.
- [x] In barge predictoor2, for each pdr-*.yml file: use yaml-cli2 docker image tag, update CLI call to use pdr <cmd> <arg1> <arg2> ..
- [x] In pdr-backend yaml-cli2, in pytest CI, use barge predictoor2 branch. Now, pdr-backend CI should pass
- [x] Get VPS 2 running barge predictoor2 branch in pytest configuration. Now, Trent's local pytest runs (using VPS) should pass.
- [x] Bug: when barge does cli call to publisher, it says "You must set RPC_URL environment variable". Fix this (in pdr-backend publisher): it should use YAML properly. Also: publisher may have DRY violations, fix those. #437
- [x] Get VPS 1 running barge predictoor2 branch in pdr bot configuration. Then, check that the pdr bot flow in vps.md works.
[x] [Trent] Trader: Estimate gas failed #442
[x] [Trent] Refactor trueval: merge 3 agent files into one #445
[x] Rename/move files & dirs for proper separation among lake, AI models, analytics #446
[x] In CI, polars error: 'column with name "timestamp_right" already exists' #459
[x] Remove need to specify 'stake_token' and 'owner_addrs' in ppss.yaml; auto-detect instead #397

Then:

[x] [Calina] Refine ppss.yaml for closer 1:1 mapping to bots #451
[x] Filter by feeds in publisher module #490
[x] [Trent] aimodel_data_factory.py AssertionError: missing data col: binance:BTC/USDT:None #517
[x] [Trent] aimodel_data_factory.py AssertionError: missing data col: binance:ETH/USDT:close #519
[x] [Calina] Add code climate and improve test coverage #494
- [x] [Calina] plots pop up, unwanted, when running test_sim_engine.py #525
- [x] [Calina] codeclimate complexity tests are too sensitive, and at least partly redundant #526
[x] [Berkay] Replace dftool with pdr cli #521
[x] [Berkay] Avoid NETWORK_OVERRIDE while supporting remote barge, or make it more maintainable #499
[x] [Berkay] Thorough system-level tests #413

Berkay to do the final merge steps:

[x] Merge pdr-backend and barge at once:
- [x] In barge predictoor2, for each pdr-*.yml file: use latest docker image tag (not yaml-cli2)
- [x] In pdr-backend yaml-cli2, in pytest CI, remove line for barge predictoor2 branch
- [x] In pdr-backend, merge yaml-cli2 into main. This will auto-do a new docker image of pdr-backend latest
- [x] In barge, merge predictoor2 into main
[x] In pdr-backend, do a new release via release-process.md
[x] ! ! ! Check topup github action (here). Q: is it working, under the freshly-updated main branch? If no, there will be havoc with trueval bot and more. A: yes :)
[x] In dockerhub, remove image of yaml-cli2 branch
[x] In docs repo, is "claim" documentation up-to-date? Change if needed
[x] Close barge#393
[x] From pdr-backend main, go through all READMEs and ensure that they work.
[x] Close this issue

History of this issue

We did a lot of groundwork in #278 "[EPIC] [Sim, bots] Easy sim --> pdr/trader bot flow w plots".
Then The first part of this issue got built in #370 "YAML & CLI. It had PR #371, and was merged into main. The merge was premature, so we backed it out.
The initial aim of this issue was to test and stabilize more. That's still an aim, but the scope has increased.

idiom-bytes commented 11 months ago

trueval bot failing is failing. I arrived here in a different route. I created a ticket (barge#393), and linked to it from pdr-backend#411.

idiom-bytes commented 10 months ago

[Model Factory + Model Data Factory move]

Topic 1 - Moving these files Hi, regarding these two, I believe we should move them out of data_eng.

As far as I understand it we're re-creating/training the model every step, so nothing from model_data_factory will actually get saved out either on disk, or in-cache.

It's not a data factory in the term of writing-to-lake or serving other etl functions, but a consumer of this data to serve the model requirements.

This seems to strengthen that this is more related to /sim/ than data-eng or etl/lake. I'm thinking /sim/ would be a more appropriate place.

Topic 2 - Rethinking ModelDataFactory Based on my comments above, I believe ModelDataFactory could perhaps be reduced to functions inside of ModelFactory that prepare the data for the model to train/build/etc... But not a "DataFactory" as per the definition of a factory that generates data, that is saved to the data lake, in parquet, or dealing with the core workflows.

It's simply consuming from that output, and providing a blob for ModelFactory

model_data_factory -> consumer from merged ohlcv df from lake -> nothing is saved on lake / isn't a data factory -> creates xY data locally, in-memory, for model/sim/predictoors -> different sim/models may have different xY data creation

model_factory -> nothing is consumed from lake -> nothing is saved on lake / isn't a data factory -> it's an abstraction for sim/predictoors

trentmc commented 10 months ago

[Model Factory + Model Data Factory move] Topic 1 - Moving these files

Agreed. Actually they briefly had their own directory before, but I moved them into data_eng because both dir's were small at the time. Now that data_eng is growing, it's a good time to give them their own directory again. Call it aimodel/. Expect this to grow a lot in coming months.

We can and should go further: rename data_eng/ to lake/, and also create a directory analytics/. Then make sure the appropriate modules are in each of {lake/, ai_model, and analytics}. I just created #446 to handle this. Full details there.

Topic 2 - Rethinking ModelDataFactory [AiModelDataFactory] But not a "DataFactory" as per the definition of a factory that generates data, that is saved to the data lake, in parquet, or dealing with the core workflows.

Overall, disagree. Here's why.

First: I agree that AiModelDataFactory doesn't put data into the lake. Nonetheless, AiModelDataFactory is still a data factory, since it creates data. There's nothing in the words "data factory" that implies that the output data must go into a data lake.

Second: AiModelDataFactory and AiModelFactory each have their own unique roles & responsibilities. They are cleanly separated concerns. And, each will grow in coming months: AiModelDataFactory for fancier feature engineering (eg ARIMA, information ticks), and AIModelFactory for fancier modeling (eg FFX, proper dynamical models).

Finally: the rename from data_eng/ to lake/, and the move of AiModel* to aimodel/ helps to clarify roles & responsibility of each.

Therefore, AiModelDataFactory code should not fold into AiModelFactory.

idiom-bytes commented 10 months ago

Hi @trentmc if you get a chance to fix this...

************* Module pdr_backend.util.test_ganache.__init__
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract -> pdr_backend.models.token) (cyclic-import)
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract) (cyclic-import)
pdr_backend/util/test_ganache/__init__.py:1:0: R0401: Cyclic import (pdr_backend.models.base_contract -> pdr_backend.ppss.web3_pp -> pdr_backend.models.predictoor_contract -> pdr_backend.models.fixed_rate) (cyclic-import)

trentmc commented 10 months ago

Hi @trentmc if you get a chance to fix this ... Cyclic import ...

See https://github.com/oceanprotocol/pdr-backend/issues/455

trentmc commented 9 months ago

Done!

oceanprotocol / pdr-backend