sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
2.08k stars 206 forks source link

LinAlgError: singular matrix #910

Open js185529 opened 2 years ago

js185529 commented 2 years ago

Just trying to simply pass the dataframe into create_report() and receive this error

LinAlgError: singular matrix

C:...\Python\Python39\site-packages\scipy\stats\stats.py:4812: RuntimeWarning: overflow encountered in longlong_scalars (2 xtie ytie) / m + x0 y0 / (9 m * (size - 2))) C:...\Python\Python39\site-packages\scipy\stats\stats.py:4814: RuntimeWarning: invalid value encountered in sqrt np.sqrt(var) / np.sqrt(2)))

Pretty big dataset of features. If I knew the column, I could deal with it (but it is a 170 or so columns so I imagine it's not just 1 column). I read this was potentially fixed on later versions? I'm using the latest version 0.4.3

jinglinpeng commented 2 years ago

Hi @js185529 , not sure how the error happens. Is it possible to share with us a dataset that can reproduce the error so I can take a look?

js185529 commented 2 years ago

Sure. When I run it on a sample of the data, it works, when I try to run on the full dataset, I get that error. Let me work out the sample that raises the error (and anonymize the data) and I will share.

js185529 commented 2 years ago

@jinglinpeng , here is a dataset where I get the error. https://www.dropbox.com/s/ea1cii2ihqzujyi/sample.csv?dl=0