lux-org / lux

Automatically visualize your pandas dataframe via a single print! 📊 💡
Apache License 2.0
5.15k stars 365 forks source link

Excel file error with `pd.cut` #403

Closed METTA2 closed 2 years ago

METTA2 commented 3 years ago

Describe the bug

I am excited to use Lux but am frustrated by the following error:

**C:\Users\Michael\miniconda2\lib\site-packages\IPython\core\formatters.py:918: UserWarning: Unexpected error in rendering Lux widget and recommendations. Falling back to Pandas display. Please report the following issue on Github: https://github.com/lux-org/lux/issues

C:\Users\Michael\miniconda2\lib\site-packages\lux\core\frame.py:628: UserWarning:Traceback (most recent call last): File "C:\Users\Michael\miniconda2\lib\site-packages\lux\core\frame.py", line 590, in _ipythondisplay self.maintain_recs() File "C:\Users\Michael\miniconda2\lib\site-packages\lux\core\frame.py", line 432, in maintain_recs custom_action_collection = custom_actions(rec_df) File "C:\Users\Michael\miniconda2\lib\site-packages\lux\action\custom.py", line 76, in custom_actions recommendation = lux.config.actions[action_name].action(ldf) File "C:\Users\Michael\miniconda2\lib\site-packages\lux\action\correlation.py", line 50, in correlation vlist = VisList(intent, ldf) File "C:\Users\Michael\miniconda2\lib\site-packages\lux\vis\VisList.py", line 43, in init self.refresh_source(self._source) File "C:\Users\Michael\miniconda2\lib\site-packages\lux\vis\VisList.py", line 336, in refresh_source lux.config.executor.execute(self._collection, ldf, approx=approx) File "C:\Users\Michael\miniconda2\lib\site-packages\lux\executor\PandasExecutor.py", line 143, in execute PandasExecutor.execute_2D_binning(vis) File "C:\Users\Michael\miniconda2\lib\site-packages\lux\executor\PandasExecutor.py", line 422, in execute_2D_binning vis._vis_data["yBin"] = pd.cut(vis._vis_data[y_attr], bins=lux.config.heatmap_bin_size) File "C:\Users\Michael\miniconda2\lib\site-packages\pandas\core\reshape\tile.py", line 238, in cut rng = (nanops.nanmin(x), nanops.nanmax(x)) File "C:\Users\Michael\miniconda2\lib\site-packages\pandas\core\nanops.py", line 135, in f result = alt(values, axis=axis, skipna=skipna, kwds) File "C:\Users\Michael\miniconda2\lib\site-packages\pandas\core\nanops.py", line 394, in new_func result = func(values, axis=axis, skipna=skipna, mask=mask, kwargs) File "C:\Users\Michael\miniconda2\lib\site-packages\pandas\core\nanops.py", line 977, in reduction result = getattr(values, meth)(axis) File "C:\Users\Michael\miniconda2\lib\site-packages\numpy\core_methods.py", line 34, in _amin return umr_minimum(a, axis, None, out, keepdims, initial, where) TypeError: '<=' not supported between instances of 'str' and 'float'**

dorisjlee commented 3 years ago

Hi @METTA2, Thanks for reporting this issue! Could you let us know what dataset and operations were performed before this issue showed up? This will help us debug the problem.

METTA2 commented 3 years ago

Hi Doris,

Loved you on the podcast!

Its a dataset from work with confidential info.

All I did was read in an Excel file as usual and then dropped duplicates. Same files that usually work in pandas.

Should I try on a sample dataset?

I believe that I have installed Lux correctly as I did the test below.

!jupyter nbextension enable --py luxwidget Enabling notebook extension luxwidget/extension...

Thanks

dorisjlee commented 3 years ago

Hi @METTA2,

Thanks for your kind words! I think this is the same bug that is related to #394. I haven't really been able to debug it given that I have not yet found a dataset that reproduces this error on my end.

I'd encourage you to try out the sample dataset first, just to see if your Lux installation is working. For example:

import lux
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/college.csv")
df

We also have a few other datasets here for you to play with.

I'll make a note to dig into this issue again soon and see if I can come up with a dataset that reproduces this error.

METTA2 commented 3 years ago

Hi Doris,

Lux works on the sample dataset! So its installed correctly.

Wonder why it doesn't work on my datasets - are there any data types that are problematic? Any suggestions on what I should look out for or amend in my dataset to minimise the chances of it not working?

Thanks for your help

dorisjlee commented 3 years ago

Hi @METTA2, It's great to hear that it works on the sample dataset!

I'm suspecting that your Excel file might contain columns where one or more column that has mixed data types. In particular, some parts of that column are detected as strings, while others values are integers. If you are able to identify which columns they are in your excel file, you might be able to run df['name_of_your_mixed_type_column'].astype(str) to coerce the mixed column to str type. Another possible approach is to see if you can save that excel file as a CSV, and subsequently reading it into pandas via pd.read_csv. See some of my comments here.

We understand that these temporary fixes are not ideal, but let us know if it helps! We have added some fixes to handle mixed dtype columns in the past, but it doesn't seem to have helped in this case.

dorisjlee commented 2 years ago

Resolved this issue in #448. You can access these updated changes by upgrading to the latest version of Lux:

pip install --upgrade lux-api
jupyter nbextension install --py luxwidget
jupyter nbextension enable --py luxwidget

I hope that this resolves the problem that you were seeing!