pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Time range filter not working with timezone-naive timestamps #455

Closed goto-loop closed 10 months ago

goto-loop commented 11 months ago

Applying pm4py.filter_time_range() on an event log with timezone-naive timestamps throws an error: TypeError: Invalid comparison between dtype=datetime64[us] and Timestamp. As soon as the timestamp column is made timezone-aware, the filter works again.

import pandas as pd
import pm4py

df = pd.read_parquet("./tests/input_data/running-example.parquet")

# Comment out the following line to remove the error
df["time:timestamp"] = df["time:timestamp"].dt.tz_localize(None)

log = pm4py.format_dataframe(df, timest_format="%Y-%m-%d %H:%M:%S%z")
filtered_log = pm4py.filter_time_range(log, "2010-01-01 00:00:00", "2012-01-01 00:00:00", mode="traces_contained")
fit-alessandro-berti commented 11 months ago

Thanks for signaling. We will simplify the management of datetimes in pm4py, allowing for timezone aware and timezone-naive specifications.

fit-alessandro-berti commented 10 months ago

Dear @goto-loop

The issue should be solved in the current release