oracle / tribuo

Tribuo - A Java machine learning library
https://tribuo.org
Apache License 2.0
1.28k stars 177 forks source link

Question about input feature mapping #347

Open paranjapeved15 opened 1 year ago

paranjapeved15 commented 1 year ago

Ask the question I am trying to use a onnx model trained in python to run inference on with tribe. I was providing the input to my onnx model in the following format: float[][] sourceArray = new float[2][7]; for (int i=0; i < 2; i++){ for (int j=0; j < 7; j++) sourceArray[i][j] = 1.0f; } OnnxTensor tensor = OnnxTensor.createTensor(env,sourceArray); Map<String, OnnxTensor> inputs = new HashMap(); inputs.put("input",tensor);

what is the best way in tribuo to do the same? I checked this tutorial (https://tribuo.org/learn/4.3/tutorials/external-models-tribuo-v4.html#Loading-in-an-ONNX-model) but I am not sure how to structure my feature mapping.

Is your question about a specific ML algorithm or approach? No

Is your question about a specific Tribuo class? No

System details

Additional context Add any other context or screenshots about the question

Craigacp commented 1 year ago

Tribuo needs to know what the mapping is from the feature names it produces to the feature indices that the ONNX model expects. If your model expects a feature vector (rather than an image or some other kind of structured input) then you need to pass in a DenseTransformer to the ONNXExternalModel, along with the string -> int mapping. The feature names are user controlled in some senses, they either come from the headers/field names of the csv/db/json data, or are constructed as the string representation of padded indices (i.e. 000, 001 etc).

For your example with a feature vector of [2][7] is that a batch size of 2 and a 7 element feature vector?

paranjapeved15 commented 1 year ago

Yes @Craigacp, the 2 is a batch. So it really needs a float array with 7 elements as input.

Craigacp commented 1 year ago

Ok, so you'd want to make a Tribuo example containing those 7 features, with appropriate names. Those can be semantic ones if the features actually have names, or just 00,...,06, then supply the mapping as appropriate.

paranjapeved15 commented 1 year ago

okay, but what about the key "input" do I not need that in tribuo?

Craigacp commented 1 year ago

It's the last argument when constructing an ONNXExternalModel - https://tribuo.org/learn/4.3/javadoc/org/tribuo/interop/onnx/ONNXExternalModel.html#createOnnxModel(org.tribuo.OutputFactory,java.util.Map,java.util.Map,org.tribuo.interop.onnx.ExampleTransformer,org.tribuo.interop.onnx.OutputTransformer,ai.onnxruntime.OrtSession.SessionOptions,java.nio.file.Path,java.lang.String).

paranjapeved15 commented 1 year ago

Oh I see. And at inference time what should be the format of my input? What type of object and what function should I be using to run inference on the Model

Craigacp commented 1 year ago

The model exposes a predict function which accepts Iterable<Example<T>> or Example<T>, returning Prediction<T> which contains the predicted values and any confidence scores produced by the model.

paranjapeved15 commented 1 year ago

Any code example which shows how to build an Example object from raw values?

Craigacp commented 1 year ago

Not specifically, but you can see how the examples are built in all the DataSource implementations, e.g. this one for loading in LibSVM format data - https://github.com/oracle/tribuo/blob/main/Core/src/main/java/org/tribuo/datasource/LibSVMDataSource.java#L348.