microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.35k stars 2.88k forks source link

how to create float tensor with missing value using java runtime #10036

Open reneix opened 2 years ago

reneix commented 2 years ago

Is your feature request related to a problem? Please describe. I'm using lightgbm classifier exported onnx model pred in java runtime. my onnx model included missing value imputer , but I have no idea how to create a tensor with missing values.

when using Float[][] createTensor throw exception as below, however, float[][] works. my question is when using float[][], I can't set missing values as null or something else....

please help.

exception : ai.onnxruntime.OrtException: Cannot create an OnnxTensor from a base type of class java.lang.Float at ai.onnxruntime.TensorInfo.constructFromJavaArray(TensorInfo.java:232) at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:337) at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:321)

code snap as below: for (Map.Entry<String, Float[]> kv : floatFeature.entrySet()) { String feaName = kv.getKey(); Float[] feaValues = kv.getValue(); Float[][] feaValues2d = new Float[feaValues.length][1]; for (int i = 0; i < feaValues.length; i++) { // feaValues2d[i] = ArrayUtils.toPrimitive(new Float[] {feaValues[i]}); feaValues2d[i] = new Float[] {feaValues[i]}; } OnnxTensor test = OnnxTensor.createTensor(env, feaValues2d); newTensor.put(feaName, test); }

System information

Craigacp commented 2 years ago

LightGBM says it treats NaN as a missing value by default. Assuming the ONNX converter for LightGBM does the same, then you can set the missing elements of your input to NaN in Java and it should work.

reneix commented 2 years ago

@Craigacp thanks, NaN works for float tensor ! btw, for String tensor, null not works for missing value, but "" works

Craigacp commented 2 years ago

Yeah, I'd expect String tensor handling to be a special case as it's going to be very model dependent. In general ORT doesn't do nulls, but we can add Sparse Tensor support into the Java API which will allow you to naturally exclude some elements. I'm not sure if you can have a String sparse tensor though.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

paranjapeved15 commented 10 months ago

@Craigacp Float.NaN would work for float variables, but what if I need to pass nulls for Int and Long type variables?

Craigacp commented 10 months ago

Those aren't representable in the ONNX format, so it depends how your model was generated. It may use sparse tensors, which are supported, or it may use some sentinel value.

paranjapeved15 commented 10 months ago

@Craigacp can you please give an example of sparse tensors in java?

Craigacp commented 10 months ago

https://github.com/microsoft/onnxruntime/blob/main/java/src/test/java/ai/onnxruntime/SparseTensorTest.java

paranjapeved15 commented 10 months ago

@Craigacp I don't understand what exactly is happening in the code above. Can you please elaborate? Also, following onnx documentation is pretty hard, for example I am not sure what CSRCTensor means; so when I go to the java docs https://onnxruntime.ai/docs/api/java/ai/onnxruntime/OnnxSparseTensor.CSRCTensor.html, it does not state what csrc tensor means 😢. I think there is much room to improve onnx documentation and give code examples (with appropriate comments) for such operations.

Craigacp commented 10 months ago

Firstly, I think it's probably unlikely that your model does use sparse tensors as they have limited support in the ONNX standard and are mostly supported by extension operators. If you can explain what your model is and how it's exported into ONNX we might be able to provide more useful input.

Second, that's a compressed sparse row matrix. The definition is a little complicated, there are two sets of indices, the outer indices have the number of values in each row, the inner indices are the position in that row. The Java sparse tensor work isn't very well documented as it was built out for a specific use case which didn't happen and so I'm not sure how much support there is in ONNX Runtime for the rest of the parts it needs. I should probably go through and uplift the C API documentation onto the various Java tensor classes which would at least give a bit of a hint, but I've not had time.

paranjapeved15 commented 10 months ago

@Craigacp here are some details of my model: model: XGBoost classifier input features: combination of some floats and ints (all numeric features) converted to onnx as: convert_sklearn( pipe, 'pipeline_xgboost', initial_inputs, target_opset=18) registered converter and shape calculator for xgboost.

Craigacp commented 10 months ago

I think @xadupre can probably help here as it depends on how the model converter deals with XGBoost's missing values support.