h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
954 stars 362 forks source link

Unable to save a Mojo Model using Java #5683

Closed venkim closed 8 months ago

venkim commented 8 months ago

Sparkling Water Version

3.42

Issue description

Expected behavior: Should be able to save a model. Observed behavior: 10-27 10:10:59.142 10.9.93.208:54321 51379 FJ-1-5 INFO water.default: Completing model GBM_model_1698426647655_1 10-27 10:10:59.142 10.9.93.208:54321 51379 main WARN water.default: Model Builder for algo = GBM is not registered. Unable to determine if Model has a MOJO. Please override method haveMojo(). gbmModel status of haveMojo() is false Exception in thread "main" water.exceptions.H2ONotFoundArgumentException: Failed to find schema for version: 3 and type: GBMModel at water.api.SchemaServer.schema(SchemaServer.java:285) at water.api.SchemaServer.schema(SchemaServer.java:251) at hex.ModelMojoWriter.writeModelDetails(ModelMojoWriter.java:97) at hex.ModelMojoWriter.writeExtraInfo(ModelMojoWriter.java:89) at hex.genmodel.AbstractMojoWriter.writeTo(AbstractMojoWriter.java:169) at hex.genmodel.AbstractMojoWriter.writeTo(AbstractMojoWriter.java:160) at hex.ModelMojoWriter.writeTo(ModelMojoWriter.java:77) at org.example.TrainAModel.saveMOJOModel(TrainAModel.java:414) at org.example.TrainAModel.main(TrainAModel.java:396)

If I see additional questions- activity, I am willing to answer on code or what arguments/params I use for creating the GBM model etc.,

Also, is there a way to call predict on a gbmModel from Java, instead of using a EZPredictorWrapper ?

Programming language used

Scala

Programming language version

Java 8

What environment are you running Sparkling Water on?

Other (described above)

Environment version info

Local test cluster

Brief cluster specification

1 Node

Relevant log output

10-27 10:10:59.142 10.9.93.208:54321 51379 FJ-1-5 INFO water.default: Completing model GBM_model_1698426647655_1
10-27 10:10:59.142 10.9.93.208:54321 51379 main WARN water.default: Model Builder for algo = GBM is not registered. Unable to determine if Model has a MOJO. Please override method haveMojo().
gbmModel status of haveMojo() is false
Exception in thread "main" water.exceptions.H2ONotFoundArgumentException: Failed to find schema for version: 3 and type: GBMModel
at water.api.SchemaServer.schema(SchemaServer.java:285)
at water.api.SchemaServer.schema(SchemaServer.java:251)
at hex.ModelMojoWriter.writeModelDetails(ModelMojoWriter.java:97)
at hex.ModelMojoWriter.writeExtraInfo(ModelMojoWriter.java:89)
at hex.genmodel.AbstractMojoWriter.writeTo(AbstractMojoWriter.java:169)
at hex.genmodel.AbstractMojoWriter.writeTo(AbstractMojoWriter.java:160)
at hex.ModelMojoWriter.writeTo(ModelMojoWriter.java:77)
at org.example.TrainAModel.saveMOJOModel(TrainAModel.java:414)
at org.example.TrainAModel.main(TrainAModel.java:396)

Code to reproduce the issue

private void buildModel(){
        try {
            System.out.println("In the try block..");
            // Split into train, test and holdout sets
            Key[] keys = new Key[]{Key.make("train.hex"),Key.make("test.hex"),Key.make("hold.hex")};
            double[] ratios = new double[]{0.7,0.15,0.15};
            Frame[] frs = ShuffleSplitFrame.shuffleSplitFrame(data,keys,ratios,1234567689L);
            Frame train = frs[0];
            Frame test  = frs[1];
            //Frame hold  = frs[2];
            data.remove();
            System.out.println(train);
            System.out.println(test );

            // Create a GBM model
            GBMParameters params = new GBMParameters();
            params._train = train._key;
            params._valid = test._key;
            params._score_each_iteration = false;
            params._response_column = "YVAL";

            params._ntrees = 500;
            params._max_depth = 10;
            params._min_rows = 10;
            params._nbins = 20;

            params._learn_rate = 0.1f;      //same as default
            // Gaussian demands the response to be a number -- we have a Y or N - so wont work.
            //params._distribution = DistributionFamily.gaussian; // same as default

            Job<GBMModel> job = new GBM(params).trainModel();
            this.gbmModel = job.get();
            System.out.println("gbmModel status of haveMojo() is " + this.gbmModel.haveMojo());

            String filename = JCodeGen.toJavaId(gbmModel._key.toString()) + ".java";
            StreamingSchema ss = new StreamingSchema(gbmModel.new JavaModelStreamWriter(false), filename);
            StreamWriter sw = ss.getStreamWriter();
            OutputStream os = new FileOutputStream(OUT_SRC_DIR + filename);
            sw.writeTo(os);

        } catch(Exception excp) {
            System.out.println("Catching exception.." + excp.toString());
            excp.printStackTrace();
        } finally {
            if (data != null) {
                data.remove();
            }
        }
    }

The actual save of the model to MOJO format is done using 

    private void saveMOJOModel(){
        try {
            this.mojoModelPath = OUT_MODEL_DIR + modelFileName;
            FileOutputStream mojoModelOutStream = new FileOutputStream(this.mojoModelPath);
            gbmModel.getMojo().writeTo(mojoModelOutStream);
            //gbmModel.exportMojo(this.mojoModelPath, true);
            mojoModelOutStream.close();
            System.out.println("Model written out as a MOJO to file " + mojoModelPath);
        } catch (IOException ioe){
            System.out.println("Exception during write Model: - Mode written out as a MOJO to file  ");
            ioe.printStackTrace();
        }

    }
krasinski commented 8 months ago

We currently do not support Java in Sparkling Water, and H2O-3 doesn't expose the Java API to the outside world. Is it possible for you to use a different tech stack to achieve what you're trying to do?

krasinski commented 8 months ago

please comment if there's anything more to discuss here