pm4py / pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.
https://pm4py.fit.fraunhofer.de
GNU General Public License v3.0
722 stars 286 forks source link

Compatibility with Pandas 0.24 #46

Closed Javert899 closed 5 years ago

Javert899 commented 5 years ago

Hi everyone,

Last night there was the release of Pandas 0.24.0

At the moment, our code has some issues with that version, so the requirements will be restricted to Pandas 0.23.4

Sorry for this.

Javert899 commented 5 years ago

A PIP package of PM4Py 1.0.19 is now published that mitigates this problem, requiring Pandas 0.23.4

Javert899 commented 5 years ago

Hi,

In relation to this problem, after further tests an issue has been opened on the Pandas Github repository. It seems that the to_dict('records') method is now bugged when the column name contains :

https://github.com/pandas-dev/pandas/issues/25050

Javert899 commented 5 years ago

According to the developers, this is a known bug of Pandas 0.24.0 and will be solved in Pandas 0.24.1 that will be released in a couple of days.

Since then, we stick to 0.23.4

Javert899 commented 5 years ago

Pandas 0.24.1 has been released, but some problems are still there, so issue not closed :(

====================================================================== ERROR: test_filtering_timeframe (tests.dataframe_prefilter.DataframePrefilteringTest)

Traceback (most recent call last): File "C:\Users\berti\pm4py-source\tests\dataframe_prefilter.py", line 79, in test_filtering_timeframe df1 = timestamp_filter.apply_events(df, "2011-03-09 00:00:00", "2012-01-18 23:59:59") File "C:\Users\berti\pm4py-source\pm4py\algo\filtering\pandas\timestamp\timestamp_filter.py", line 127, in apply_events df = df[df[timestamp_key] > dt1] File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1766, in wrapper res = na_op(values, other) File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1625, in na_op result = _comp_method_OBJECT_ARRAY(op, x, y) File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1603, in _comp_method_OBJECT_ARRAY result = libops.scalar_compare(x, y, op) File "pandas_libs\ops.pyx", line 97, in pandas._libs.ops.scalar_compare TypeError: can't compare offset-naive and offset-aware datetimes

====================================================================== ERROR: test_dfCasedurationPlotSemilogx (tests.graphs_forming.GraphsForming)

Traceback (most recent call last): File "C:\Users\berti\pm4py-source\tests\graphs_forming.py", line 20, in test_dfCasedurationPlotSemilogx x, y = pd_case_statistics.get_kde_caseduration(df) File "C:\Users\berti\pm4py-source\pm4py\statistics\traces\pandas\case_statistics.py", line 183, in get_kde_caseduration cases = get_cases_description(df, parameters=parameters) File "C:\Users\berti\pm4py-source\pm4py\statistics\traces\pandas\case_statistics.py", line 95, in get_cases_description stacked_df[timestamp_key + "_2"] = stacked_df[timestamp_key + "_2"].astype('int64') // 10 9 File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5691, in astype kwargs) File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 531, in astype return self.apply('astype', dtype=dtype, kwargs) File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 395, in apply applied = getattr(b, f)(kwargs) File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 534, in astype **kwargs) File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 633, in _astype values = astype_nansafe(values.ravel(), dtype, copy=True) File "C:\Users\berti\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 683, in astype_nansafe return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) File "pandas_libs\lib.pyx", line 546, in pandas._libs.lib.astype_intsafe TypeError: int() argument must be a string, a bytes-like object or a number, not 'datetime.datetime'

====================================================================== ERROR: test_dfDateAttribute (tests.graphs_forming.GraphsForming)

Traceback (most recent call last): File "C:\Users\berti\pm4py-source\tests\graphs_forming.py", line 60, in test_dfDateAttribute x, y = pd_attributes_filter.get_kde_date_attribute(df) File "C:\Users\berti\pm4py-source\pm4py\algo\filtering\pandas\attributes\attributes_filter.py", line 306, in get_kde_date_attribute return attributes_common.get_kde_date_attribute(date_values, parameters=parameters) File "C:\Users\berti\pm4py-source\pm4py\algo\filtering\common\attributes\attributes_common.py", line 147, in get_kde_date_attribute int_values = sorted([(x - datetime(1970, 1, 1)).total_seconds() for x in values]) File "C:\Users\berti\pm4py-source\pm4py\algo\filtering\common\attributes\attributes_common.py", line 147, in int_values = sorted([(x - datetime(1970, 1, 1)).total_seconds() for x in values]) TypeError: can't subtract offset-naive and offset-aware datetimes


Ran 63 tests in 51.533s

FAILED (errors=3)

Javert899 commented 5 years ago

Hi,

Now PM4Py develop branch has full support for Pandas 0.24.1 :)

Some elements, like:

Are now differentiated between versions of Pandas lower than 0.24 and versions of Pandas upper than 0.24