In my opinion, we should drop the "embedded runtime". Instead, we should only rewrite the graph, and then call onnxruntime library to fold the constants. In this way, we will not need tensor.h or another tensor library in optimizer anymore.
For example, to fuse Add(Add(x, 1), 2), instead of calculating the result of Add(1, 2) in onnx-optimizer itself, we can just rewrite the graph to Add(x, Add(1, 2)), and call onnxruntime to fold Add(1, 2) to 3.
It is also the way of tensorflow built-in optimizer.
For op fusion (like the fusion of conv and bn), we have implemented a "small onnxruntime" in tensor.h. It increases the workload (the more fusion we want to do, the more op we need to implement), and brings many problems (https://github.com/onnx/optimizer/issues/6, https://github.com/onnx/optimizer/issues/8, https://github.com/onnx/onnx/issues/2677). However, as we know, onnx itself is not designed to infer onnx ops. It is unwise to take the effort to maintain an "embedded runtime" in the presence of onnxruntime.
In my opinion, we should drop the "embedded runtime". Instead, we should only rewrite the graph, and then call onnxruntime library to fold the constants. In this way, we will not need tensor.h or another tensor library in optimizer anymore.
For example, to fuse Add(Add(x, 1), 2), instead of calculating the result of Add(1, 2) in onnx-optimizer itself, we can just rewrite the graph to Add(x, Add(1, 2)), and call onnxruntime to fold Add(1, 2) to 3.
It is also the way of tensorflow built-in optimizer.