sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
2.01k stars 204 forks source link

index 0 is out of bounds for axis 0 with size 0 for dataprep==0.4.1 #803

Closed farazk86 closed 2 years ago

farazk86 commented 2 years ago

Describe the bug I am getting the above error when using the latest version of dataprep==0.4.1 for create_report. This is for a csv file that worked fine in previous version of dataprep. When Using dataprep==0.3.0, the above error does not occur and the report is generated without any problems.

To generate a report for my csv using the latest version, I had to load the csv, save it using pandas and then load that saved csv again.

from dataprep.eda import create_report
import pandas as pd

df0 = pd.read_csv('incidents.csv', encoding='utf-8')
# only retain the columns I am interested in
df0 = df0[['Customer Department', 'Time To Resolve']]

#save it again as csv
df0.to_csv('incidents_eda.csv')
# now load it again
df = pd.read_csv('incidents_eda.csv')

report = create_report(df)
report.save('eda') 

The above work around worked but then in the report I have a new variable unnamed as can be seen from the screenshots below and due to this the interactions and correlation graphs are not at all helpful :(

image_2022-02-04_125538

image_2022-02-04_125635

When I try to use create_report without the above work-around. I get the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-3-e889d219c362>](https://localhost:8080/#) in <module>()
     11 df = pd.read_csv('incidents.csv', encoding='utf-8')
     12 
---> 13 report = create_report(df)
     15 report.save('eda') 

6 frames
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in __getitem__(self, key)
   4602         if is_scalar(key):
   4603             key = com.cast_scalar_indexer(key, warn_float=True)
-> 4604             return getitem(key)
   4605 
   4606         if isinstance(key, slice):

IndexError: index 0 is out of bounds for axis 0 with size 0

Desktop (please complete the following information):

jinglinpeng commented 2 years ago

Hi @farazk86 , thanks a lot for the detailed bug report. When you save dataframe to csv, pandas will save the index by default, this is where the unnamed column comes from. For the reported bug, Is it possible to share the data with us so that we can reproduce the error? If there is a privacy issue, any other data that can reproduce the error could also be helpful.

farazk86 commented 2 years ago

Hi @jinglinpeng

Thanks for the reply.

Please find a sample of the full csv file attached. This is also giving the same error that I reported above for the latest version of dataprep. I have removed any identifiable information from the file due to privacy reasons.

from dataprep.eda import create_report
import pandas as pd

df0 = pd.read_csv('incidents_sample4.csv', encoding='Windows-1252')

# only retain the columns I am interested in
df0 = df0[['Customer Department', 'Time To Resolve']]

report = create_report(df0)
report # show report in notebook
report.save('eda') 

The error for the above code is:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-13-85f6484892f1>](https://localhost:8080/#) in <module>()

---> 14 report = create_report(df0)
     15 report # show report in notebook
     16 report.save('eda') # save report to local disk

6 frames
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in __getitem__(self, key)
   4602         if is_scalar(key):
   4603             key = com.cast_scalar_indexer(key, warn_float=True)
-> 4604             return getitem(key)
   4605 
   4606         if isinstance(key, slice):

IndexError: index 0 is out of bounds for axis 0 with size 0

Just to mention again, that this file works fine and the report generates when using dataprep==0.3.0

Thanks

incidents_sample4.csv

jinglinpeng commented 2 years ago

Hi @farazk86 Thanks for the data and code! I can reproduce the error and have found the problem. Should be fixed before next release (2.21)

farazk86 commented 2 years ago

Great. Thanks a lot :)