labscript-suite-temp-2 / lyse

lyse is an analysis framework. It coordinates the running of python analysis scripts on experiment data as it becomes availiable, updating plots in real time.
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Compatibility with pandas 0.18 #18

Closed philipstarkey closed 8 years ago

philipstarkey commented 8 years ago

Original report (archived issue) by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


Since pandas 0.18, adding shots to the lyse GUI fails with:

#!python

Traceback (most recent call last):
  File "C:\Anaconda\lib\threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\labscript_suite\lyse\__main__.py", line 1523, in incoming_buffer_loop
    self.shots_model.add_files(filepaths, new_row_data)
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 114, in f
    return inmain(fn, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 74, in inmain
    return get_inmain_result(in_main_later(fn,False,*args,**kwargs))
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 94, in get_inmain_result
    exec('raise type, value, traceback')
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 53, in event
    result = event.fn(*event.args, **event.kwargs)
  File "C:\labscript_suite\lyse\__main__.py", line 1335, in add_files
    self.dataframe = concat_with_padding(self.dataframe, new_row_data)
  File "C:\labscript_suite\lyse\dataframe_utilities.py", line 144, in concat_with_padding
    return pandas.concat(dataframes, ignore_index=True)
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 846, in concat
    return op.get_result()
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 1038, in get_result
    copy=self.copy)
  File "C:\Anaconda\lib\site-packages\pandas\core\internals.py", line 4545, in concatenate_block_managers
    for placement, join_units in concat_plan]
  File "C:\Anaconda\lib\site-packages\pandas\core\internals.py", line 4642, in concatenate_join_units
    for ju in join_units]
  File "C:\Anaconda\lib\site-packages\pandas\core\internals.py", line 4915, in get_reindexed_values
    missing_arr = np.empty(self.shape, dtype=empty_dtype)
TypeError: data type not understood
philipstarkey commented 8 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


The error derives from the call to concat_with_padding that attempts to concatenate an initially empty DataFrame with the first non-empty DataFrame of added shots. Specifically, this is due to columns with timezone aware datetimes, e.g. the run time column.

Minimal breaking example (pandas 0.18.1, numpy 1.11.0):

#!python
df1 = pd.DataFrame(columns=['filepath'])
df2 = pd.DataFrame(data=[['C:\\test.h5', pandas.Timestamp('2016-08-18 16:04:59+1000', tz='Australia/Sydney')]],
                   columns=['filepath', 'run time'])
pd.concat([df1, df2], ignore_index=True)

This fails as above, at the call to np.empty. Explicitly,

#!python
In [151]: df2.dtypes[1]
Out[151]: datetime64[ns, Australia/Sydney]

In [152]: np.empty((0, 1), df2.dtypes[1])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\labscript_suite\lyse\dataframe_utilities.py in <module>()
----> 1 np.empty((0, 1), df2.dtypes[1])

TypeError: data type not understood

The error does not occur for naive datetimes.

If the columns of the empty DataFrame are not specified, there is no such problem, i.e. the following works.

#!python
df1 = pd.DataFrame()
df2 = pd.DataFrame(data=[['C:\\test.h5', pandas.Timestamp('2016-08-18 16:04:59+1000', tz='Australia/Sydney')]],
                   columns=['filepath', 'run time'])
pd.concat([df1, df2], ignore_index=True)
philipstarkey commented 8 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


This is resolved by only concatenating non-empty DataFrames, as per pull request #5.

philipstarkey commented 8 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


Looks related to these pandas bugs:

https://github.com/pydata/pandas/issues/12985

https://github.com/pydata/pandas/issues/12244

philipstarkey commented 8 years ago

Original comment by Russell Anderson (Bitbucket: rpanderson, GitHub: rpanderson).


Fixes issue #18, where adding shots to lyse failed with pandas >= 0.18. concat_with_padding now only tries to concatenate non-empty DataFrames.

Modified pandas requirement accordingly, with no upper limit on version. Modified labscript_utils requirement to allow above version specification.

→ \<\<cset f1b822e8432e3a6730a20252ce1bb263a772e551>>