zestai / zrp

Zest Race Predictor
Apache License 2.0
28 stars 3 forks source link

Custom column names don't work, due to too much memory of arguments #34

Open egnor opened 6 months ago

egnor commented 6 months ago

Is there an existing issue for this?

What happened?

Attempting to process some data, I passed my column names into the ZRP constructor, but then got this error:

####################################
Processing rows: 0:25000
####################################
Data is loaded
   [Start] Validating input data
Traceback (most recent call last):

[[ boring stack frames from my app elided for brevity ]]

  File "/home/egnor/source/rcv/ea_race/ea_zrp.py", line 58, in main
    output_df = predictor.transform(ea_df)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/egnor/source/rcv/python_venv/lib/python3.11/site-packages/zrp/zrp.py", line 172, in transform
    prepared_data_chunk = z_prepare.transform(data_chunk)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/egnor/source/rcv/python_venv/lib/python3.11/site-packages/zrp/prepare/prepare.py", line 84, in transform
    gen_process.fit(data)
  File "/home/egnor/source/rcv/python_venv/lib/python3.11/site-packages/zrp/prepare/preprocessing.py", line 392, in fit
    raise ValueError(f"     Missing required data {val_na}")            
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError:      Missing required data ['AddressLine1', 'First', 'Last', 'Mid', 'State/Province', 'Zip4', 'house_number']

Most of those columns DID exist in the original dataframe, but they got renamed by rename_data_columns. However, in the ZRP class, the original kwargs are saved in self.params_dict. After the rename, that params dict USED to be cleared, but that call was commented out, because not all params are column definitions.

Steps To Reproduce

Use ZRP with nonstandard column names, passing those names into the ZRP() constructor, attempt to make predictions.

What browsers are you seeing the problem on?

No response

Environment

- OS: Linux (Ubuntu 23.10)
- Python: 3.11.8
- ZRP: git main as of this bug report

Anything else?

No response

Code of Conduct

egnor commented 6 months ago

(Note, I'm working on what I think is a fix for this, but I'm not sure I'm doing it right)