haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.97k stars 1.13k forks source link

IllegalArgumentException when suing SimpleImputer for data sourced from json file #769

Closed jamalromero closed 2 months ago

jamalromero commented 2 months ago

Exception thrown when using SimpleImputer for data sourced from a json file. The exception occurs when data has numerical values. Removing numerical data, the SimpleImputer fit and apply changes without error. The following code reads two json files, one has numerical data, and the other has strings only. Example:

DataFrame df = Read.json(Util.getFilePath("test-json-without-numerical.json"));
System.out.println(df);
System.out.println("=========== Using SimpleImputer with non numerical data ===========");
SimpleImputer.fit(df).apply(df);
df = Read.json(Util.getFilePath("test-json-with-numerical.json"));
System.out.println(df);
System.out.println("=========== Using SimpleImputer with numerical data ===========");
SimpleImputer.fit(df).apply(df);

See files attached. Columns nk, hc and t are numerical Console log:

[hh: String, ll: String, a: String, c: String, tz: String, cy: String, g: String, h: String, gr: String, al: String, l: String]
+---------+--------------------+--------------------+---+----------------+-------+------+------+---+--------------+-------+
|       hh|                  ll|                   a|  c|              tz|     cy|     g|     h| gr|            al|      l|
+---------+--------------------+--------------------+---+----------------+-------+------+------+---+--------------+-------+
|1.usa.gov| 42.576698, -70.9...|Mozilla/5.0 (Wind...| US|America/New_York|Danvers|A6qOVH|wfLQtf| MA|en-US,en;q=0.8|orofrog|
|     j.mp| 40.218102, -111....|GoogleMaps/Roches...| US|  America/Denver|  Provo|mwszkS|mwszkS| UT|          null|  bitly|
+---------+--------------------+--------------------+---+----------------+-------+------+------+---+--------------+-------+

=========== Using SimpleImputer with non numerical data ===========
[hh: String, ll: String, a: String, c: String, tz: String, g: String, h: String, gr: String, al: String, l: String, t: long, cy: String, hc: long, nk: int]
+---------+--------------------+--------------------+---+----------------+------+------+---+--------------+-------+----------+-------+----------+---+
|       hh|                  ll|                   a|  c|              tz|     g|     h| gr|            al|      l|         t|     cy|        hc| nk|
+---------+--------------------+--------------------+---+----------------+------+------+---+--------------+-------+----------+-------+----------+---+
|1.usa.gov| 42.576698, -70.9...|Mozilla/5.0 (Wind...| US|America/New_York|A6qOVH|wfLQtf| MA|en-US,en;q=0.8|orofrog|1331923247|Danvers|1331822918|  1|
|     j.mp| 40.218102, -111....|GoogleMaps/Roches...| US|  America/Denver|mwszkS|mwszkS| UT|          null|  bitly|1331923249|  Provo|1308262393|  0|
+---------+--------------------+--------------------+---+----------------+------+------+---+--------------+-------+----------+-------+----------+---+

=========== Using SimpleImputer with numerical data ===========
Exception in thread "main" java.lang.IllegalArgumentException: Impute non-floating primitive types
    at smile.feature.imputation.SimpleImputer.lambda$apply$0(SimpleImputer.java:119)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
    at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
    at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
    at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
    at java.base/java.util.concurrent.ForkJoinPool.helpComplete(ForkJoinPool.java:2145)
    at java.base/java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:420)
    at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:668)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
    at java.base/java.util.stream.IntPipeline.forEach(IntPipeline.java:463)
    at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:620)
    at smile.feature.imputation.SimpleImputer.apply(SimpleImputer.java:98)
    at com.lixusnet.Data.getBitlyUsaGov(Data.java:32)
    at com.lixusnet.Data.main(Data.java:17)

test-json-with-numerical.json test-json-without-numerical.json

haifengl commented 2 months ago

The fix is in master branch now. Thanks.