haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.02k stars 1.13k forks source link

ClassCastException when calling DataFrame.omitNullRows() #767

Closed jamalromero closed 5 months ago

jamalromero commented 5 months ago

Hi, I am trying to read csv files using some sample data. One file has no missing values and another has some missing values. See files attached. Here's the code:

System.out.println("Reading data with no missing values...");
DataFrame df = Read.csv(Util.getFilePath("sample-good.csv"));
System.out.println(df);
System.out.println("Reading same data with missing values...");
df = Read.csv(Util.getFilePath("sample-bad.csv"));
System.out.println(df);
System.out.println("Reading same data after omitting rows with missing values...");
df = Read.csv(Util.getFilePath("sample-bad.csv")).omitNullRows();
System.out.println(df);

And here's the output:

Reading data with no missing values...
[V1: String, V2: int, V3: boolean, V4: double, V5: int, V6: String, V7: int, V8: String]
+------------------+---+-----+------+---------+------+---+----------+
|                V1| V2|   V3|    V4|       V5|    V6| V7|        V8|
+------------------+---+-----+------+---------+------+---+----------+
|Terrell Mclaughlin|123| true|23.345|123456782|  Male|  1|29/09/2024|
|    Julianne Stark|124| true|23.346|123456783|Female|  2|30/09/2024|
|  Kristine English|125| true|23.347|123456784|  Male|  3|01/10/2024|
|       Dorthy Pena|126|false|23.348|123456785|Female|  4|02/10/2024|
| Columbus Franklin|127| true|23.349|123456786|  Male|  2|03/10/2024|
|    Aida Gallagher|128|false| 23.35|123456787|Female|  1|04/10/2024|
|       Essie Riley|129| true|23.351|123456788|  Male|  4|05/10/2024|
|    Judson Benitez|130| true|23.352|123456789|Female|  2|06/10/2024|
|   Barrett Escobar|131|false|23.353|123456790|  Male|  3|07/10/2024|
|         Gino Pugh|132|false|23.354|123456791|Female|  3|08/10/2024|
+------------------+---+-----+------+---------+------+---+----------+

Reading same data with missing values...
[V1: String, V2: Integer, V3: Boolean, V4: Double, V5: String, V6: String, V7: Integer, V8: String]
+------------------+----+-----+------+----------+------+----+----------+
|                V1|  V2|   V3|    V4|        V5|    V6|  V7|        V8|
+------------------+----+-----+------+----------+------+----+----------+
|Terrell Mclaughlin| 123| true|23.345|1331926359|  Male|   1|29/09/2024|
|    Julianne Stark|null| true|23.346| 123456783|Female|   2|30/09/2024|
|  Kristine English| 125| true|23.347| 123456784|  Male|   3|01/10/2024|
|       Dorthy Pena| 126| null|23.348| 123456785|Female|   4|02/10/2024|
| Columbus Franklin| 127| true|23.349|      null|  Male|null|03/10/2024|
|    Aida Gallagher| 128|false| 23.35| 123456787|Female|   1|04/10/2024|
|       Essie Riley| 129| true|  null| 123456788|  null|   4|      null|
|    Judson Benitez| 130| true|23.352| 123456789|Female|   2|06/10/2024|
|   Barrett Escobar| 131|false|23.353| 123456790|  Male|   3|07/10/2024|
|         Gino Pugh| 132|false|23.354| 123456791|Female|   3|08/10/2024|
+------------------+----+-----+------+----------+------+----+----------+

Reading same data after omitting rows with missing values...
Exception in thread "main" java.lang.ClassCastException: class smile.data.vector.VectorImpl cannot be cast to class smile.data.vector.BooleanVector (smile.data.vector.VectorImpl and smile.data.vector.BooleanVector are in unnamed module of loader 'app')
    at smile.data.DataFrameImpl$DataFrameRow.getBoolean(DataFrameImpl.java:608)
    at smile.data.DataFrameImpl.<init>(DataFrameImpl.java:278)
    at smile.data.DataFrameImpl.<init>(DataFrameImpl.java:216)
    at smile.data.DataFrame.of(DataFrame.java:1537)
    at smile.data.DataFrame.omitNullRows(DataFrame.java:105)
    at com.lixusnet.Data.main(Data.java:18)

The exception happens only when the 'Boolean' column 'V3' has null values. sample-bad.csv sample-good.csv

haifengl commented 5 months ago

Thanks for reporting. The fix is in master branch now.