yalue / onnxruntime_go

A Go (golang) library wrapping microsoft/onnxruntime.
MIT License
168 stars 31 forks source link

How to receive outputs into types other than tensors (maps, sequences)? #57

Closed Asquator closed 1 month ago

Asquator commented 2 months ago

I'm new to ONNX, but something is unclear to me with the API. I have an ONNX that goes like:

output {
    name: "output_label"
    type {
      tensor_type {
        elem_type: 7
        shape {
          dim {
          }
        }
      }
    }
  }
  output {
    name: "output_probability"
    type {
      sequence_type {
        elem_type {
          map_type {
            key_type: 7
            value_type {
              tensor_type {
                elem_type: 1
              }
            }
          }
        }
      }
    }
  }
}

So the output type should be a sequence of maps from Int64 to Float32? How to receive this output in GO?

session, err := ort.NewAdvancedSession("onnx/models/model.onnx",
        []string{"float_input"}, []string{"output_label", "output_probability"},
        [] ?______________________________________?, nil)
    defer session.Destroy()

Thank you

yalue commented 2 months ago

Oops, I just responded to the issue in the other repo. At the moment, map and sequence OrtValues are not supported by onnxruntime_go, since they will require adding new types. I will likely make new Map and Sequence types in Go, which satisfy the ArbitraryTensor interface (even though they aren't tensors), so they can be used with the Run() function. Though there will be significant planning and implementation work to do, so I can't give any ETA.

My current thoughts for the future API:

type ArbitraryTensor interface {
    // ...
    // Returns some wrapper around the C.ONNXType enum (ONNX_TYPE_TENSOR, ONNX_TYPE_SEQUENCE, etc)
    // Maybe this belongs only in the Value interface rather than ArbitraryTensor?
    ValueType() ONNXType
}

// I'll create some interface or alias to be used by non-Tensor OrtValue types such as Maps or sequences. It may
// or may not be identical to the ArbitraryTensor interface depending on what the needs seem to be.
type Value interface {
    ArbitraryTensor
    // Maybe other stuff?
}

// Wraps ONNX sequence-type OrtValues.
type Sequence struct {
}

// Wraps the GetValue function from onnxruntime_c_api.h
func (s *Sequence) GetValue(index) (Value, error) {
    // ...
}

// Wraps the GetValueCount function from onnxruntime_c_api.h
func (s *Sequence) GetValueCount() (int, error) {
    // ...
}

func (s *Sequence) Destroy() error {
    // ...
}

// Wraps the CreateValue function from onnxruntime_c_api.h, for creating a sequence.
func CreateSequence(contents []Value) (*Sequence, error) {
    // ...
}

// Need similar functions for Maps
Asquator commented 2 months ago

Is it the best decision to call each output an ArbitraryTensor, if other types are defined by the standard, and new ones may come in the future?

Any ONNX file produced by sklearn unfortunately uses maps and sequences. If I omit the second output, the predictions work fine, I just can't get the probabilities. Anyway, so far it's the only library that at least does that, and it's very satisfying. Hopefully the types will be implemented in the future. For basic sklearn classifiers (LogisticRegression, AdaBoostClassifier, RandomForestClassifier), it's the only missing feature for them to work in entirety.

Thanks for your work and quick response

yalue commented 2 months ago

When I wrote the API I called it ArbitraryTensor because I didn't know any better and had never encountered a need for the other types in any of the applications I was using it for. In retrospect, calling every input or output a Tensor was indeed not the best decision, and I'm actually surprised that this issue is the first to request inclusion of the Map and Sequence OrtValues.

The main thing I want to avoid is deprecating the AdvancedSession or DynamicAdvancedSession types by changing the signatures of Run(). Upon further thought, however, a breaking change should be avoidable. I will likely change the inputs and outputs to be of type onnxruntime_go.Value as outlined in my earlier comment, and then define ArbitraryTensor to be an alias to the new Value interface. I think this should allow old code to continue working while making the type name more accurate.

I wish I could give a better estimate on a timeline, but I do plan to implement this myself, since it will touch several core parts of the library. Maybe I'll be done in a week, but maybe it will be closer to a month.

Asquator commented 2 months ago

Really impressive how rapidly this library evolves. Currently it's the only working solution for my problem. And I'm thinking... If official go support for ORT comes one day, what are the chances the API will be similar to this one?...

Asquator commented 2 months ago

Bumped into another related problem: The call i, o, err := ort.GetInputOutputInfo("model.onnx"), as well as other metadata calls fail completely with a model having non-tensor types. It even returns default values instead of an error. I saw the code comment regarding this one, but I'll put it here as a reminder that metadata calls should also be reimplemented.

Asquator commented 1 month ago

Is there any progress made?

yalue commented 1 month ago

Not very much yet, sorry. I wrote a small change to replace usage of ArbitraryTensor types with a more generic Value, replacing ArbitraryTensor with an alias to not break existing code. However, I'm hoping to make this change after merging PR #56 so that I can replace ArbitraryTensor usage in the code introduced there rather than trying to rebase that (much larger) PR.

After introducing ArbitraryTensor, I plan to do a series of small changes introducing each type individually, since, unfortunately, this looks like it will be more work than I initially expected. At least it should be doable in individual stages.

riccardopinosio commented 1 month ago

this probably matches well with the introduction of the Scalar type in that MR, i.e. we will have the ability to create Scalar, Tensor, Sequence etc types backed by the right ortValue

yalue commented 1 month ago

Quick update: I have written but not pushed support for Sequence. I only have some tests left to write. After that, I'll add Map support and those sklearn files should be supported. I'm not sure yet if I'll bother adding sparse tensor support now, or if I'll just wait until there seems to be some desire for it.

yalue commented 1 month ago

Sequences are added as of commit 80e480e2a820fb. Maps will be next, hopefully within this week.

yalue commented 1 month ago

Maps are added as of commit 4904ca2553aac67. There's an example of how to read Map outputs in TestSklearnNetwork, but I'm planning to add an easier-to-read example using the same network to the onnxruntime_go_examples repository, as well. This example should be done soon, hopefully tomorrow, along with some other upkeep in that repo.

yalue commented 1 month ago

Alright, there's now an example of getting these outputs in the non_tensor_outputs example in the onnxruntime_go_examples repo.