oceanprotocol / pdr-backend

Instructions & code to run predictoors, traders, more.
Apache License 2.0
30 stars 22 forks source link

[YAML] aimodel_data_factory.py AssertionError: missing data col: binance:BTC/USDT:None #517

Closed trentmc closed 9 months ago

trentmc commented 9 months ago

To reproduce

(In yaml-cli2 branch)

ppss.yaml had default settings. Details below.

Run sim_engine:

pdr sim ppss.yaml

Then it fails. Full traceback below.

It fails whether parquet_dir had data previously, or not.

Relevant ppss.yaml settings

Some ppss.yaml values: (the default)
```text
lake_ss:
  parquet_dir: parquet_data
  feeds:
    - binance BTC/USDT 1h
  st_timestr: 2023-06-01_00:00 # starting date for data
  fin_timestr: now # ending date for data

...

predictoor_ss:
  predict_feed: binance BTC/USDT c 1h
  bot_only:
    s_until_epoch_end: 60 # in s. Start predicting if there's > this time left
    stake_amount: 1 # stake this amount with each prediction. In OCEAN
  approach3:
  aimodel_ss:
    input_feeds:
      - binance BTC/USDT
    max_n_train: 5000 # no. epochs to train model on
    autoregressive_n : 10 # no. epochs that model looks back, to predict next
    approach: LIN

Full traceback

(venv) trentmc@tlm-macbook: ~/code/pdr-backend $ pdr sim ppss.yaml
dftool sim: Begin
Arguments:
PPSS_FILE=ppss.yaml
Start run
Get historical data, across many exchanges & pairs: begin.
  Data start: timestamp=1685577600000, dt=2023-06-01_00:00:00.000
  Data fin: timestamp=1704967729680, dt=2024-01-11_10:08:49.680
  Update all rawohlcv files: begin

    Update rawohlcv file at exchange=binance, pair=BTC/USDT: begin
      filename=/Users/trentmc/code/pdr-backend/parquet_data/binance_BTC-USDT_1h.parquet
      No file exists yet, so will fetch all data
      Aim to fetch data from start time: timestamp=1685577600000, dt=2023-06-01_00:00:00.000
      Fetch up to 1000 pts from timestamp=1685577600000, dt=2023-06-01_00:00:00.000
      newest_ut_value: 1689174000000
      Fetch up to 1000 pts from timestamp=1689177600000, dt=2023-07-12_16:00:00.000
      newest_ut_value: 1692774000000
      Fetch up to 1000 pts from timestamp=1692777600000, dt=2023-08-23_08:00:00.000
      newest_ut_value: 1696374000000
      Fetch up to 1000 pts from timestamp=1696377600000, dt=2023-10-04_00:00:00.000
      newest_ut_value: 1699974000000
      Fetch up to 1000 pts from timestamp=1699977600000, dt=2023-11-14_16:00:00.000
      newest_ut_value: 1703574000000
      Fetch up to 1000 pts from timestamp=1703577600000, dt=2023-12-26_08:00:00.000
      Just saved df with 5387 rows to new file /Users/trentmc/code/pdr-backend/parquet_data/binance_BTC-USDT_1h.parquet
    Update rawohlcv file at exchange=binance, pair=BTC/USDT: done

  Update all rawohlcv files: done
  Load rawohlcv file.
Get historical data, across many exchanges & pairs: done.
Traceback (most recent call last):
  File "/Users/trentmc/code/pdr-backend/./pdr", line 6, in <module>
    cli_module._do_main()
  File "/Users/trentmc/code/pdr-backend/venv/lib/python3.11/site-packages/enforce_typing/decorator.py", line 29, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/trentmc/code/pdr-backend/pdr_backend/cli/cli_module.py", line 44, in _do_main
    func(args)
  File "/Users/trentmc/code/pdr-backend/venv/lib/python3.11/site-packages/enforce_typing/decorator.py", line 29, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/trentmc/code/pdr-backend/pdr_backend/cli/cli_module.py", line 55, in do_sim
    sim_engine.run()
  File "/Users/trentmc/code/pdr-backend/venv/lib/python3.11/site-packages/enforce_typing/decorator.py", line 29, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/trentmc/code/pdr-backend/pdr_backend/sim/sim_engine.py", line 92, in run
    self.run_one_iter(test_i, mergedohlcv_df)
  File "/Users/trentmc/code/pdr-backend/venv/lib/python3.11/site-packages/enforce_typing/decorator.py", line 29, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/trentmc/code/pdr-backend/pdr_backend/sim/sim_engine.py", line 106, in run_one_iter
    X, y, _ = model_data_factory.create_xy(mergedohlcv_df, testshift)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/trentmc/code/pdr-backend/pdr_backend/aimodel/aimodel_data_factory.py", line 84, in create_xy
    assert hist_col in mergedohlcv_df.columns, f"missing data col: {hist_col}"
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: missing data col: binance:BTC/USDT:None
trentmc commented 9 months ago

Fixed