mmcdermott / EventStreamGPT

Dataset and modelling infrastructure for modelling "event streams": sequences of continuous time, multivariate events with complex internal dependencies.
https://eventstreamml.readthedocs.io/en/latest/
MIT License
98 stars 16 forks source link

Processing Synthetic Data with ESGPT #113

Open sujaybanerjee opened 3 months ago

sujaybanerjee commented 3 months ago

In this section, when I run this code block, I get an error.

import subprocess

command = """\ PYTHONPATH=$(pwd):$PYTHONPATH ./scripts/build_dataset.py \ --config-path="$(pwd)/sample_data/" \ --config-name=dataset \ "hydra.searchpath=[$(pwd)/configs]" """

command_out = subprocess.run(command, shell=True, capture_output=True) print(command_out.stdout.decode())

if command_out.returncode == 1: print("Command Errored!")

print(command_out.stderr.decode())

Here is the error message I get:

“$ PYTHONPATH=$(pwd):$PYTHONPATH python3 ./scripts/build_dataset.py --config-path="$(pwd)/sample_data/" --config-name=dataset "hydra.searchpath=[$(pwd)/configs]" Error executing job with overrides: [] Traceback (most recent call last): File "/home/user/EventStreamGPT/./scripts/build_dataset.py", line 364, in main ESD = Dataset(config=config, input_schema=dataset_schema) File "/home/user/EventStreamGPT/EventStream/data/dataset_base.py", line 550, in init events_df, dynamic_measurements_df = self.build_event_and_measurement_dfs( File "/home/user/EventStreamGPT/EventStream/data/dataset_base.py", line 259, in build_event_and_measurement_dfs cls._process_events_and_measurements_df( File "/home/user/EventStreamGPT/EventStream/data/dataset_polars.py", line 356, in _process_events_and_measurements_df if len(df.columns) > 4: File "/home/user/.local/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 411, in columns return self._ldf.columns() polars.exceptions.ComputeError: failed to determine supertype of cat and i64

This error occurred with the following context stack: [1] 'select' failed [2] 'with_columns' input failed to resolve [3] 'drop' input failed to resolve [4] 'with_columns' input failed to resolve [5] 'drop' input failed to resolve [6] 'filter' input failed to resolve [7] 'filter' input failed to resolve [8] 'with_columns' input failed to resolve [9] 'drop' input failed to resolve [10] 'filter' input failed to resolve [11] 'select' input failed to resolve [12] 'unique' input failed to resolve [13] 'with row index' input failed to resolve

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.”

I am using Python 3.10.12 and polar 0.20.26. I was wondering how to fix this.

mmcdermott commented 3 months ago

Hi @sujaybanerjee -- this is a polars version issue. The main branch of ESGPT is only guaranteed with polars up to 0.18.15, as is specified in the pyproject.toml file. Can you try this on the dev branch, which supports a much more recent version of polars?