haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.05k stars 1.13k forks source link

Failure serializing the model to disk #790

Open ipapa opened 3 weeks ago

ipapa commented 3 weeks ago

Describe the bug I am training a model using Random Forest classification algorithm. The algorithm is working fine producing the model. but it is failing to serialize the model to disk. I have tried multiple versions of JDK (11, 17, 22) and the result has been the same.

Expected behavior I expect that the model gets serialized to disk using Java's serialization. I want to load that model later on in order to run queries against it.

Actual behavior Exception in thread "main" java.io.NotSerializableException: smile.data.type.ObjectType$$Lambda$18/0x00007e9cd8cef840 at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1543) at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1500) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1423) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1169) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1543) at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1500) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1423) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1169) at java.base/java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1369) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1165) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1543) at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1500) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1423) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1169) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1543) at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1500) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1423) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1169) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1543) at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1500) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1423) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1169) at java.base/java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1369) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1165) at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1543) at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1500) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1423) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1169) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:345) at smile.io.Write.object(Write.java:58) at com.*.machinelearning.SmileRandomForestTrainer.main(SmileRandomForestTrainer.java:74)

Code snippet // Load the Parquet file DataFrame data = Read.csv(trainingDataPath, CSVFormat.DEFAULT.withFirstRecordAsHeader().withRecordSeparator(','));

    // Define formula for the model
    Formula formula = Formula.lhs("target_hit");

    // Set up properties for Random Forest
    Properties props = new Properties();
    props.setProperty("numTrees", "100");
    props.setProperty("maxDepth", "23");
    props.setProperty("minSamplesSplit", "2");
    props.setProperty("minSamplesLeaf", "1");
    props.setProperty("randomState", "11");

    // Train the Random Forest model
    RandomForest rf = RandomForest.fit(formula, data, props);

    // write the model to filesystem
    Write.object(rf, Path.of(outputModelPath));
    LOG.info("Model trained and saved to: {}", outputModelPath);

Input data

target_hit,device_make,device_model,device_language,device_name,carrier,total_unique_sessions,unique_sessions_with_target_hit
0,70,3177,207,24634,1,113,9
0,70,1553,207,24634,1,70,12
1,70,2361,207,24634,1,33,16
0,376,798,1,36706,910,7150,2
0,70,800,207,24634,1,11,1
0,70,3184,207,24634,1,101,3
0,376,2,1278,36706,1495,43,2
0,70,2358,207,24634,1,82,0
0,376,1554,1278,36706,1493,1426,11

Additional context openjdk version "22.0.2" 2024-07-16 OpenJDK Runtime Environment Corretto-22.0.2.9.1 (build 22.0.2+9-FR) OpenJDK 64-Bit Server VM Corretto-22.0.2.9.1 (build 22.0.2+9-FR, mixed mode, sharing)

Amazon Linux - EC2 in AWS.

Running with Smile 3.1.1 verison.

haifengl commented 1 week ago

Thanks for reporting. I cannot reproduce it though. Can you please try the master branch?