nok / sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
BSD 3-Clause "New" or "Revised" License
1.28k stars 170 forks source link

Java Error: "Too many constants" #8

Closed lichard49 closed 6 years ago

lichard49 commented 7 years ago

Attempted to port a somewhat large random forest classifier (7.2 MB) for Java and compiling the Java class ended up giving a "too many constants" error, because of the number of hardcoded values to compose the tree. I circumvented this by using a simple script to separate out all (static) methods into individual classes and files. Is there a cleaner way internally to get around this problem or achieve this effect?

nok commented 7 years ago

Hello @lichard49,

I noticed this issue in the past by porting and using a large svm classifier. In my case I fixed it manually by using a property file which stores the model data (support vectors).

But in Java ...

A single method in a Java class may be at most 64KB of bytecode.

Currently I'm working on the next release, where you can run predictions against the ported models in Python.

After that I will fix this issue by adding an alternative export for larger models (in Java). Because most models are larger than 64KB of bytecode.

Happy coding, Darius 🌵

8bit-pixies commented 6 years ago

@nok hopefully this isn't a stupid request as I don't normally use Java; could you provide a template of how you got around this for Java export?

nok commented 6 years ago

Hello @chappers, I tested different solutions how we can store large model data in separate files.

First I tested .properties files:

public static Properties load(String path) throws IOException {
    Properties props = new Properties();
    FileInputStream inStream = new FileInputStream(path);
    BufferedInputStream buffer = new BufferedInputStream(inStream);
    props.load(buffer);
    inStream.close();
    return props;
}
public static double[][][] convert(double[][][] output, String[] data) {
    for (int i = 0, x = 0, xl = output.length; x < xl; x++) {
        for (int y = 0, yl = output[x].length; y < yl; y++) {
            for (int z = 0, zl = output[x][y].length; z < zl; z++) {
                output[x][y][z] = Double.parseDouble(data[i++]);
            }
        }
    }
    return output;
}
Properties model = Tmp.load(System.getProperty("user.dir") + "/src/model.properties");
// model.properties: "inters=0.0, 0.0, 10.0, 12. ... "
double[][][] inters = Tmp.convert(new double[2][3][4], model.getProperty("inters").split(","));
System.out.println(inters[0][1][1]);

But I don't like that solution, because it's not generic (<?> ...), what means that multiple versions of the convert method (method overloading) are required. Furthermore the other programming languages don't really work well with properties files. So I decided to use the JSON format for storing all dynamic model data, but again Java is the black sheep. It unfortunately doesn't have any JSON parser in the standard packages. The status is that I will give org.json a go.

8bit-pixies commented 6 years ago

Thank you so much - I'm keen on seeing a more fleshed out version in the future, but at least I have an adhoc/manual way working in the interim.

nok commented 6 years ago

Okay, that's good 👍 !

In the future the transpiled estimators will be cleaner, faster and more dynamically. Today small changes can affect over 40 different transformations and the related test cases.

nok commented 6 years ago

Hello @lichard49 @chappers , I have good news, with the very latest commit on the master branch you can transpile a RandomForestClassifier with imported data. Have a look into the prepared notebook for a demonstration which uses the export_data=True argument in the predict method.

You can use the following commands to install the latest version:

pip uninstall -y sklearn-porter
pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master
pernorc85 commented 6 years ago

I tried c with export_data = True, it seems not work. Do you plan to support exported model in c in the future?

Vasilissk-prog commented 4 years ago

Hi and thank you very much for your contribution.

I am trying to export a RandomForestClassifier( n_estimators= 100, max_features = 'sqrt',max_depth=100, n_jobs=-1, verbose = 1) , but I think that my laptop runs out of memory. Do you think that I can try in a server with better specifications or only option is to reduce n_estimators and max_depth?