Tape machine does not reset properly for some models

martinkjlarsson commented 1 year ago

For some models, I get the correct result the first time running, but subsequent runs return the wrong results. Maybe there is some state in the tape machine that does not reset?

For example, using the following Julia code, I created a simple one-layer test network and saved it as ONNX (testmodel.zip).

using Flux, ONNXNaiveNASflux, Random
Random.seed!(0)
testmodel = Dense(rand(Float32, 2, 8))
save("testmodel.onnx", testmodel)

When running the following Go code

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "math/rand"

    "github.com/owulveryck/onnx-go"
    "github.com/owulveryck/onnx-go/backend/x/gorgonnx"
    "gorgonia.org/tensor"
)

func randSlice(n int) []float32 {
    x := make([]float32, n)
    for i := 0; i < n; i++ {
        x[i] = rand.Float32()
    }
    return x
}

func main() {
    rand.Seed(0)

    backend := gorgonnx.NewGraph()
    model := onnx.NewModel(backend)

    b, _ := ioutil.ReadFile("testmodel.onnx")
    input := tensor.New(tensor.WithShape(1, 8), tensor.Of(tensor.Float32), tensor.WithBacking(randSlice(8)))

    // b, _ := ioutil.ReadFile("mnist-12.onnx")
    // input := tensor.New(tensor.WithShape(1, 1, 28, 28), tensor.Of(tensor.Float32))

    // b, _ := ioutil.ReadFile("resnet50-v1-12.onnx")
    // input := tensor.New(tensor.WithShape(1, 3, 300, 300), tensor.Of(tensor.Float32))

    err := model.UnmarshalBinary(b)
    if err != nil {
        log.Fatal(err)
    }
    model.SetInput(0, input)
    fmt.Println(input)

    for i := 0; i < 5; i++ {
        err = backend.Run()
        if err != nil {
            log.Fatal(err)
        }
        output, err := model.GetOutputTensors()
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println(output[0])
    }
}

I get the printout

R[0.94519615  0.24496509  0.65595627  0.05434384   0.3675872  0.28948045   0.1924386  0.65533215]
R[1.382385  2.133724]
R[ 2.76477  4.267448]
R[4.1471553   6.401172]
R[5.5295405   8.534896]
R[ 6.911926  10.668619]

The first result 1.382385 2.133724 is the same as what I get in Julia with the same input vector, but the subsequent runs produce ever increasing values. Indeed, it looks like the output is not reset to zero and the results simply accumulate.

This seems to always happen for models I create in Julia, but also some other ones, e.g., resnet50-v1-12.onnx. However, other models, e.g., mnist-12.onnx, seems to not have this issue. I am running Go 1.19.3 and onnx-go v0.5.0.

I do not know if this is an issue with the ONNX files, the models, the tape machine, or if I simply have done something wrong in the Go code. Any help is appreciated. Thanks.

vv1zard commented 1 year ago

I have the same issue. Have you Solved it？

martinkjlarsson commented 11 months ago

I abandoned onnx-go and wrote my own wrapper for the ONNX Runtime C API. It works for our purposes.

oramasearch / onnx-go

Tape machine does not reset properly for some models #202