mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). #279

Closed ijeffking closed 3 years ago

ijeffking commented 3 years ago

could not convert string to float: "This story is about a canoe float with grandpa. We had a great float trip with my dad but were saddened to realize how bad his Alzheimer's really is."

pplonski commented 3 years ago

Thank you for reporting the issue. Could you please provide the code example to reproduce the problem. Maybe there is some problem with column type in the input data frame? Is the column with text set as object type?

ijeffking commented 3 years ago

Appreciate the response. This is the dataset https://raw.githubusercontent.com/dphi-official/Datasets/master/hippocorpus/train_set_label.csv

Yes the column with text is set as object type.

pplonski commented 3 years ago

@ijeffking How are you loading the date into the pandas? I have checked the dataset from your comment and it doesn't keep all columns aligned in the Excel preview, which means that there can be problems with separator or column number problem in the data.

pplonski commented 3 years ago

OK, I got the reason. Some of the values in the data are larger than float32 range.

pplonski commented 3 years ago

I clipped data to be in the float32 range. Fix will be in the next release 0.7.20.

ijeffking commented 3 years ago

Thank you so much

On Tue, 12 Jan 2021, 21:39 Piotr, notifications@github.com wrote:

Closed #279 https://github.com/mljar/mljar-supervised/issues/279.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mljar/mljar-supervised/issues/279#event-4197828857, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGIILVN6E4WTNSKN7LQAG3TSZSXQPANCNFSM4V3YYMUA .