sebapersson / Test_onnx

0 stars 0 forks source link

Do not use ONNX for PEtab SciML extension #2

Open sebapersson opened 1 month ago

sebapersson commented 1 month ago

I have created an example network in this repository to experiment with, where I created a simple feed-forward network and exported it to ONNX. After experimenting with ONNX, I believe we should not use the ONNX standard format for the PEtab SciML extension, based on the PROS and CONS below.

PROS

CONS

Support

Julia does not have a Julia -> Flux (the main ML library) importer. See issues on ONNX.jl. Instead, the output is a NNlib trace, which is not easy to convert to a Flux model.

Jax does not have an importer that imports into a standard neural network structure, such as Haiku. The closest I found is this example, which creates an executable function based on the ONNX graph.

PyTorch does not have an official importer. There is a third-party importer, onnx2torch. However, using the output from onnx2torch to build a network in Julia, Jax or SBML is not straightforward.

Keras has an importer, but I was unable to make it work even on a simple feed-forward network.

Overall, there does not seem to be a straightforward tool to extract ML model architecture from an ONNX file. Thus, if we use the ONNX format, we would likely need to write a custom (likely Python) parser that converts an ONNX graph into an intermediate architecture file. Implementations can then use this intermediate architecture file to construct the data-driven model, avoiding the need to write an ONNX importer for each tool we want to support in SciML models. Here I want to highlight that I think it is important that we can get the architecture from the ONNX-file, so we can most efficiently use built in functions in ML packages.

Export

Building a model in the ONNX library is not straightforward. See here. Therefore, the expected user workflow would involve building the model in an established tool like Keras or PyTorch. However, most standard machine learning packages are large dependencies that might be tricky to install, and expecting users to use such dependencies is not ideal.

Conclusion

Using ONNX would likely require coding an ONNX importer, which can be done, e.g. see the graph below on the onnx model for a feed-forward network. To fully leverage the functionality of machine learning packages, such as Equinox or Lux.jl, this importer would need to import an ONNX model into a specific format. This would be a lot of work. The importer could potentially leverage tools like onnx2torch to extract architecture, but having PyTorch as a dependency is not ideal (e.g., in Julia, we try to avoid Python dependencies).

Moreover, the current standard would force users to use tools like PyTorch to build the data-driven model. As the SciML supporting packages would likely use Jax or Julia, this means users would have to switch between many different tools.

Alternative Approach

Allow the user to write the network layer architecture in a format similar to PyTorch. For example, in the network.yaml file, specify layers like:

layers: [
             Linear(in_features=2, out_features=5, bias=true), 
             tanh, 
             Linear(in_features=5, out_features=5, bias=true),
             tanh,
             Linear(in_features=5, out_features=5, bias=true),
             tanh,
             Linear(in_features=5, out_features=2, bias=true)]

This would create a feed-forward network with two hidden layers using tanh activation functions. Based on this way of writing ML model, we can specify which functions and layers are allowed in spec, e.g., tanh, relu, etc. This format would also support more complex architectures, such as convolutional neural networks. An additional benefit is that it would be straightforward for users to code the ML model, and it would be easy for tools to parse the model.

For the parameters table, as with this approach we code ML models layer by layer, for example, parameters in layer 1 could be referred to in the parameters table as networkName_layer1_weight.... Specifically, for each kind of layer (e.g., Linear), we can specify how to set up parameters in the parameters table.

Additional info

The onnx-model for a feedforward network, has a graph that looks like below. It could be parsed, however, custom parser for schemes like this requires a bit of work.

node {
  input: "star"
  input: "onnx::MatMul_24"
  output: "/input/MatMul_output_0"
  name: "/input/MatMul"
  op_type: "MatMul"
}
node {
  input: "input.bias"
  input: "/input/MatMul_output_0"
  output: "/input/Add_output_0"
  name: "/input/Add"
  op_type: "Add"
}
node {
  input: "/input/Add_output_0"
  output: "/Tanh_output_0"
  name: "/Tanh"
  op_type: "Tanh"
}
node {
  input: "/Tanh_output_0"
  input: "onnx::MatMul_25"
  output: "/hidden_1/MatMul_output_0"
  name: "/hidden_1/MatMul"
  op_type: "MatMul"
}
node {
  input: "hidden_1.bias"
  input: "/hidden_1/MatMul_output_0"
  output: "/hidden_1/Add_output_0"
  name: "/hidden_1/Add"
  op_type: "Add"
}
node {
  input: "/hidden_1/Add_output_0"
  output: "/Tanh_1_output_0"
  name: "/Tanh_1"
  op_type: "Tanh"
}
node {
  input: "/Tanh_1_output_0"
  input: "onnx::MatMul_26"
  output: "/hidden_2/MatMul_output_0"
  name: "/hidden_2/MatMul"
  op_type: "MatMul"
}
node {
  input: "hidden_2.bias"
  input: "/hidden_2/MatMul_output_0"
  output: "/hidden_2/Add_output_0"
  name: "/hidden_2/Add"
  op_type: "Add"
}
node {
  input: "/hidden_2/Add_output_0"
  output: "/Tanh_2_output_0"
  name: "/Tanh_2"
  op_type: "Tanh"
}
node {
  input: "/Tanh_2_output_0"
  input: "onnx::MatMul_27"
  output: "/output/MatMul_output_0"
  name: "/output/MatMul"
  op_type: "MatMul"
}
node {
  input: "output.bias"
  input: "/output/MatMul_output_0"
  output: "end"
  name: "/output/Add"
  op_type: "Add"
}
name: "main_graph"
initializer {
  dims: 5
  data_type: 1
  name: "input.bias"
  raw_data: "\341V\\\275\355)\315>K\277\234>#\266\204=\244\371a\275"
}
initializer {
  dims: 5
  data_type: 1
  name: "hidden_1.bias"
  raw_data: "\257z\203>[\207\324\276\341\351\254\276\024\'\262>\233d\372="
}
initializer {
  dims: 5
  data_type: 1
  name: "hidden_2.bias"
  raw_data: "Bn3>\277\356\321\276 \036\016\276\037\230\316=E\305\211>"
}
initializer {
  dims: 2
  data_type: 1
  name: "output.bias"
  raw_data: "XSR>\316@\261>"
}
initializer {
  dims: 2
  dims: 5
  data_type: 1
  name: "onnx::MatMul_24"
  raw_data: "\250\241\262>kd)?\371\357\026\277\372v\037?\256T\202\276\365\030\031\277j\006\247>\325\277\270>@\245\306\276\372L\031\276"
}
initializer {
  dims: 5
  dims: 5
  data_type: 1
  name: "onnx::MatMul_25"
  raw_data: "\023\314\370\275\345\2762=?\213L\276\232v\243>+\\7\276\351\352\337\276\027=g>\266\375\265>\243q\"\276\211~\336\276\006\275,>\222\021\004<\245\256\025>\330\255f\276o,\321>Z/v\274\021\033\270>\242\363\205>\264\226#\275\232\223-\276\214\301\013\276\217\302;\276pa\t\276#\342\231>\225\361\314>"
}
initializer {
  dims: 5
  dims: 5
  data_type: 1
  name: "onnx::MatMul_26"
  raw_data: "\230M\320>\244?{\276\234\364\335\275\254\200\331>,\240t\276\343Z\223=f\255\262=\253:\340>\337\330W=M^z>\216T\217>6f\017\276\215\252y\276Lu!>\240\350~\276\035\245\005\276<a\230\276\352\305\006=e)\273\276\314\210\326>zI\361\274_Fo>b\351\250\276#F\265>\036V\254>"
}
initializer {
  dims: 5
  dims: 2
  data_type: 1
  name: "onnx::MatMul_27"
  raw_data: "\306^B:\016dv\276m\004\026\276z\364\312\2761C\311\276]\321U>i\314A>u\035\242>\000z->\035\251.>"
}
input {
  name: "star"
  type {
    tensor_type {
      elem_type: 1
      shape {
        dim {
          dim_value: 2
        }
      }
    }
  }
}
output {
  name: "end"
  type {
    tensor_type {
      elem_type: 1
      shape {
        dim {
          dim_value: 2
        }
      }
    }
  }
}
sebapersson commented 1 month ago

@dilpath and @m-philipps

dilpath commented 1 month ago

Thanks a lot for the detailed write up and the comprehensive assessment in the different frameworks. Since PEtab ideally builds upon modelling formats that are well-established with a nice tool ecosystem across frameworks and languages, I agree about not investing too much time into supporting ONNX now.

Regarding the format then, I would opt for whatever is most easiest for tool developers now. YAML makes sense, but there are also some tabular formats that could make sense.

For example, for some CNN [1], your YAML makes sense, but maybe we try to capture all information present in a pytorch __repr__, e.g. the explicit layer IDs will make it easier in the parameter table too:

>>> print(net.__repr__())
Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

We could also use tables that match the tabular summaries provided by keras [2] or a pytorch extension [3]. Just a suggestion -- my current opinion is your YAML suggestion is best for now.

[1] https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#define-a-convolutional-neural-network [2] https://stackoverflow.com/questions/36946671/keras-model-summary-result-understanding-the-of-parameters [3] https://github.com/TylerYep/torchinfo

sebapersson commented 1 month ago

Thanks for the input!

I also think YAML works the best, and it also consistent with the format we generally use in PEtab. I think it also should be among the more easy formats for a user to enter (the summary tables can be a bit complex).

I think an ID makes sense and helps with mapping to parameters table, we can likely take it as first argument. And yes, we should probably allow most arguments that are available in PyTorch (e.g. for Conv2d the inputs that are allowed are likely consistent between tools), for each layer in the spec we can specify possible argument. As PyTorch has quite good naming we can likely use their naming scheme. So the network you provided above can be written something like:

layers : [
  Conv2d(conv1, 3, 6, kernel_size=(5, 5), stride=(1, 1)),
  MaxPool2d(pool, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
  Conv2d(conv2, 6, 16, kernel_size=(5, 5), stride=(1, 1)),
  Linear(fc1, in_features=400, out_features=120, bias=True),
  Linear(fc2, in_features=120, out_features=84, bias=True),
  Linear(fc3, in_features=84, out_features=10, bias=True)]

What do you think?

dilpath commented 1 month ago

NNEF [1] is an alternative to ONNX, and NNEF looks a lot more human- and machine-readable/-writeable. For example, see the attached file [2], taken from the examples here [3]. The sequence of assignments makes a lot of sense to me -- i.e., the first part of the file defines the matrices (variables) used in the layers, and the second part defines the sequence of manipulations from the input to the output, according to the matrices and some functions like add (could be tanh). The input external1 and output reshape1 layer are clearly defined, as well as the model ID:

graph G(external1) -> (reshape1)

Since custom importers will be necessary to some degree, this file format might be easiest, since then a tool could use their language's symbolic processing package (e.g. SymPy) to parse each clearly specified assignment in the NNEF file, from reshape1 back to external1, to create the model.

There appear to be some bidirectional converters maintained [4], and a paper that describes NNEF -> PyTorch model conversion (but I don't think they shared the actual converter code...) [5].

[1] https://www.khronos.org/nnef [2] mobilenet_v2_1.0_quant.tflite.nnef [3] https://github.com/KhronosGroup/NNEF-Tools/tree/main/models [4] https://github.com/KhronosGroup/NNEF-Tools/tree/main/nnef_tools [5] https://ieeexplore.ieee.org/document/9621003

dilpath commented 1 month ago

What do you think?

Looks good enough for me for now -- I will have a stronger opinion when I start using it.

One problem might be storage of the neural network parameters after optimization for e.g. predictions, otherwise the PEtab parameters table gets very large... if we don't want to think about that too much, then we will need to rely on NNEF or ONNX.

Currently NNEF saves each variable matrix of values to disk as separate files, independent of the model spec in graph.nnef https://registry.khronos.org/NNEF/specs/1.0/nnef-1.0.5.html#container-structure

dilpath commented 1 month ago

Still can't find the code for that IEEE paper, but it looks like there is an undocumented NNEF -> pytorch model importer in the NNEF tools package [1].

pytorch -> NNEF is also possible, via pytorch -> ONNX -> NNEF [2].

[1] https://github.com/KhronosGroup/NNEF-Tools/blob/main/nnef_tools/interpreter/pytorch/nnef_module.py [2] https://github.com/KhronosGroup/NNEF-Tools/issues/105

sebapersson commented 1 month ago

NNEF also looks viable, and can allow for more complex models (but ML models in UDEs are often quite simple). But I do not think the parsing is inherently easier than when using the yaml-syntax above (as the dimensions etc..., can quite rapidly be read from layer specification). Can you try to convert the onnx-example in the repository to NNEF?

Overall, I favor the yaml-format as I think it easier for the user, and I do not see why it would be problematic to parse. As for parameters, I think that maybe a parameters file per network can then be appropriate to avoid having the parameters table explode (and it makes it easier to employ a tuned ML module elsewhere).

sebapersson commented 1 month ago

Here is the output if I convert the onnx-file in the repo ot nnef:

graph main_graph(external1) -> (add4)
{
    external1 = external<scalar>(shape = [2]);
    variable1 = variable<scalar>(shape = [5], label = 'variable1');
    variable2 = variable<scalar>(shape = [5], label = 'variable2');
    variable3 = variable<scalar>(shape = [5], label = 'variable3');
    variable4 = variable<scalar>(shape = [2], label = 'variable4');
    variable5 = variable<scalar>(shape = [2, 5], label = 'variable5');
    variable6 = variable<scalar>(shape = [5, 5], label = 'variable6');
    variable7 = variable<scalar>(shape = [5, 5], label = 'variable7');
    variable8 = variable<scalar>(shape = [5, 2], label = 'variable8');
    matmul1 = matmul(external1, variable5);
    add1 = add(variable1, matmul1);
    tanh1 = tanh(add1);
    matmul2 = matmul(tanh1, variable6);
    add2 = add(variable2, matmul2);
    tanh2 = tanh(add2);
    matmul3 = matmul(tanh2, variable7);
    add3 = add(variable3, matmul3);
    tanh3 = tanh(add3);
    matmul4 = matmul(tanh3, variable8);
    add4 = add(variable4, matmul4);
}

Quite straighforward to parse, but I do not know if easier to parse than the yaml-format above

dilpath commented 1 month ago

Both work for me! No preference in terms of usability -- I think that will come from experience.

I agree. For the tool developer experience: YAML is probably easier to parse than NNEF. However, I see the gap between YAML and NNEF as much smaller than to ONNX -- I think both YAML and NNEF are fine choices and nothing a tool developer will grimace at. Tool developers might also benefit by writing/maintaining NNEF libraries that become popular in other fields too.

The user experience is mixed for both. Hand-writing NNEF is a little worse than YAML, but NNEF already has converters for pytorch and tensorflow, which is probably how Python users will want to define their model anyway. For Julia users (and maybe Jax), they would probably prefer YAML, since they might need to hand-write their model.

For the PEtab editor experience... the final thing in the PEtab extension needs to be a well-documented spec (with some tool support). Here, NNEF would be strongly preferred, since then we don't need to create a neural network spec format for neural networks, in addition to the hybrid spec format.

[1] https://github.com/google-deepmind/tf2jax

sebapersson commented 1 month ago

Good points. As per user experience, I agree it is hard to say (might also be that users want the network spec to be as similar to standard frameworks like PyTorch). Before committing anything to the spec I will ask for some additional input

m-philipps commented 1 month ago

Thanks for setting up the example and summarising the concerns here, Sebastian.I think you two already made most important points and I generally agree. From the three options, ONNX, NNEF and a custom yaml format, I would suggest to go with NNEF to avoid introducing a new standard.

One thing that might become relevant down the line is to use pre-trained NNs, which would be facilitated by NNEF, at least for pytorch/tensorflow.

The modularity in NNEF would probably make it easier to extend to non-feed-forward architectures, and the external1 looks like a great anchor for the mapping table and additional data inputs through the condition table.

For a good user experience we could supply some basic julia/jax helper functions for constructing NNEFs, which could be as simple as the PEtab SciML examples that we provide in the end. If this makes it easy to build feed-forward ANNs of arbitrary size, we would probably already cover a lot of the use cases in the beginning. Users can build on or customise these helper function for more extravagant architectures any time. I see some analogy to users automatically constructing their parameter and condition tables for ANNs of increasing size.

sebapersson commented 1 month ago

Thanks for the input!

Playing around a bit more, I am more hesitant towards NNEF. I built two new ML models in the repo, one with convolutional layer and one with dropout in PyTorch, and then exported to ONNX (see code below).

The convolutional model I could not export from ONNX -> NNEF due to the torch.flatten. The NNEF file for the dropout case does not really contain any dropout layer.

Now, there might be approaches to code both things in NNEF, but as at least the torch export does not support this, I find even though NNEF is a nice format, it might not be sufficiently flexible.

Code

Convolutional model

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.input = nn.Conv1d(4, 4, 3, stride=1)
        self.pool = nn.MaxPool1d(kernel_size=2)
        self.linear1 = nn.Linear(16, 2)

    def forward(self, x):
        x = F.relu(self.input(x))
        x = self.pool(x)
        x = torch.flatten(x)
        x = self.linear1(x)
        return x

Model with dropout

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.input = nn.Linear(in_features=2, out_features=5)
        self.hidden_1 = nn.Linear(in_features=5, out_features=5)
        self.dropout = nn.Dropout(0.2)
        self.hidden_2 = nn.Linear(in_features=5, out_features=5)
        self.output = nn.Linear(in_features=5, out_features=2)

    def forward(self, x):
        x = F.tanh(self.input(x))
        x = F.tanh(self.hidden_1(x))
        x = self.dropout(x)
        x = F.tanh(self.hidden_2(x))
        return self.output(x)

and its corresponding graph:

graph main_graph(external1) -> (add4)
{
    external1 = external<scalar>(shape = [2]);
    variable1 = variable<scalar>(shape = [5], label = 'variable1');
    variable2 = variable<scalar>(shape = [5], label = 'variable2');
    variable3 = variable<scalar>(shape = [5], label = 'variable3');
    variable4 = variable<scalar>(shape = [2], label = 'variable4');
    variable5 = variable<scalar>(shape = [2, 5], label = 'variable5');
    variable6 = variable<scalar>(shape = [5, 5], label = 'variable6');
    variable7 = variable<scalar>(shape = [5, 5], label = 'variable7');
    variable8 = variable<scalar>(shape = [5, 2], label = 'variable8');
    matmul1 = matmul(external1, variable5);
    add1 = add(variable1, matmul1);
    tanh1 = tanh(add1);
    matmul2 = matmul(tanh1, variable6);
    add2 = add(variable2, matmul2);
    tanh2 = tanh(add2);
    matmul3 = matmul(tanh2, variable7);
    add3 = add(variable3, matmul3);
    tanh3 = tanh(add3);
    matmul4 = matmul(tanh3, variable8);
    add4 = add(variable4, matmul4);
}
dilpath commented 1 month ago

Right... not great that dropout isn't supported. It looks like NNEF is especially for taking a model that was defined and trained in one framework, into another framework e.g. for inference.

Since dropout is a training stage feature, I think it can also make sense to define dropout in PEtab, rather than the model directly, since dropout is skipped during the inference stage. Ideally we would be able to define the full model in NNEF, then ask it to produce the model for the training stage or the inference stage -- but I guess this is not really supported...

Then it would be easy for PEtab to provide the "PEtab problem" for the training or inference stage. I'm not sure where this would be best, but e.g. via a dropout column in the parameter table.

Dropout sets layer inputs to 0 randomly, right? Then we could alternatively randomly set layer weight columns to zero to mimic the same thing. e.g., your example could be represented by parameterId parameterScale estimate dropout
variable7 lin 1 0.2

where 0.2 dropout indicates "set 20% of columns in variable7 to 0" and possibly "rescale all other columns by 1/0.8".

There is a compatibility table between officially-supported frameworks and NNEF [1]. For example, since NNEF appears more for inference, then ONNX's Dropout gets mapped "correctly" to copy. Could you check there if everything you would want to do is supported by NNEF? With the obvious exception of dropout above -- let me know if there are other training-stage aspects we should consider.

re: flatten, this seems to be a bug in the pytorch exporter, since NNEF supports a reshape function that can do the same thing -- it's already supported for other converters [1]. I guess some bugs will be expected like this. Regardless of the format we choose, we will need to provide test cases that evaluate to the same in all frameworks. I guess NNEF will get us there faster than a custom format, since we only need to write framework-specific translators for the special cases like dropout, rather than framework-specific translators for every aspect of the full model. This could be as simple as telling the user "after converting your model from NNEF to pytorch with converter X, please add a nn.Dropout(0.2) layer and apply it to tanh2 in the forward pass. Or we could use the converter to get the __init__ and forward code without dropout, manually insert the dropout code lines, then write to disk.

[1] https://github.com/KhronosGroup/NNEF-Tools/blob/main/nnef_tools/operation_mapping.md#onnx

sebapersson commented 1 month ago

Dropout via the column table could work, but the syntax would be easier in the yaml approach. Moreover, if we take the yaml-syntax to be similar to PyTorch conversion to Jax (https://docs.kidger.site/equinox/api/nn/linear/) and Lux.jl (https://lux.csail.mit.edu/stable/api/Lux/layers) would still be straightforward (as such layers are the typical building blocks).

As support, from the table I think the features I would need are supported. But that something like flatten has problems is not extremely encouraging (but yes, we will need to add tests later).

@FFroehlich what is your input on this, as well as functionality that would be nice to have?

FFroehlich commented 1 month ago

Not a big fan of inventing a new standard, see https://xkcd.com/927/. Overall to me it still seems like ONNX has more community buy in, for example huggingface has exporters https://huggingface.co/docs/transformers/en/serialization (might be worthwhile checking out the other formats they support) and MS for example provides LLama 2 as ONNX: https://github.com/microsoft/Llama-2-Onnx. However at the end of the day the spec should ideally be format agnostic and work with both standards and it's then up to tool developers to implement what is practical.

I think it's sensible to only specify elements of the network that are necessary for inference rather than training, that also appears to be what ONNX/NNEF are designed for. Adding dropout to the parameter table seems consistent, but I don't think we can/want to provide support for the whole kitchen sink of approaches that are out there (we also don't do this for training).

FFroehlich commented 1 month ago

there is a jax runtime for onnx https://github.com/google/jaxonnxruntime (tested on GPT-2, BERT, LLAMA) and multiple julia runtimes, https://github.com/jw3126/ONNXRunTime.jl https://github.com/FluxML/ONNX.jl https://github.com/DrChainsaw/ONNXNaiveNASflux.jl. To me, the absence of a onnx lux importer speaks to the fragmentation of the julia ecosystem rather than an inherent shortcoming of ONNX itself.

sebapersson commented 1 month ago

Thanks, Fabian, for the input!

It is a fair point to omit training details.

Given input above, I could accept we should choose between ONNX and NNEF. The catch though is that NNEF and ONNX, more seem to be geared towards transferring trained models to deployment (e.g. likely why PyTorch does not have an official importer, but an exporter), see here and here while we are interested in setting up problems for training, but also for storing the results.

If we would like to support one of these formats, ONNX likely has broader support, with many tools exporting to it. Meanwhile, NNEF may be easier to write a parser for if needed (e.g., if integrating onnxruntime with either Julia or Diffrax proves challenging, which I think as onnx runtime seem to be more for deployment, not setting up problems for training). However, since both NNEF and ONNX provide an output graph, writing a parser for either format is possible if necessary.

With implementation ease in mind, a good next step would be to check if onnxruntime can be used with Diffrax and Julia. Otherwise, for ease of parsing NNEF might be better option. Or one alternative could be to use separate specs for setting up problems for training (as onnx is more for deployment), and using onnx to export trained models. What do everyone think?

FFroehlich commented 1 month ago

Yes, fair point that the typical use case for both formats is deployment, which doesn't fully match our use-case.

It sounds like https://github.com/KhronosGroup/NNEF-Tools provide bidirectional conversion between the formats, so ease of parsing shouldn't be a major factor.

dilpath commented 1 month ago

For anyone reading and wanting to catch up, some of the sections here starting at the section "Training and inference" describe some of the same issues: https://tech-blog.sonos.com/posts/optimising-a-neural-network-for-inference/

NNEF [2] and ONNX [3] both seem to have some libraries available to manipulate/parse them.

Both support named model structures like layers it seems, so probably no problem for PEtab to reference "all weights in layer myFirstLayer" in the parameters table.

ONNX also has a more readable "textual representation" [4], which looks more like NNEF, and which can be converted into standard ONNX [5].

BioModels even has ONNX models! [6]

What do everyone think?

I agree, it makes sense to test out the ecosystem now and see what works (runtimes, libraries/APIs). As for training vs. inference, so far I haven't seen a big issue there -- if we only support "inference" in the model format, we can extend PEtab with a few important things for training, like dropout. No strong opinion there.

[1] https://github.com/PEtab-dev/libpetab-python/blob/main/petab/models/model.py [2] https://github.com/KhronosGroup/NNEF-Tools/tree/main/nnef-pyproject#nnef-parser---repository [3] https://onnx.ai/onnx/repo-docs/PythonAPIOverview.html#creating-an-onnx-model-using-helper-functions [4] https://onnx.ai/onnx/repo-docs/Syntax.html [5] https://onnx.ai/onnx/repo-docs/PythonAPIOverview.html#onnx-parser [6] https://www.ebi.ac.uk/biomodels/search?query=onnx&domain=biomodels

sebapersson commented 1 month ago

Thanks Dilan for the input.

I might have missed that NNEF has bi-directional tools (I only saw trained model -> NNEF), and as noted above, even torch.flatten caused problems.

To move forward and complete this discussion,

I propose using the YAML syntax. While https://xkcd.com/927/ is a classic, defining layers in PyTorch syntax (e.g., linear, conv1d) means we are not inventing a new standard. Instead, we conform to a syntax for setting up trainable networks used by the most popular ML package. This approach also simplifies specifying supported layers and importing the model into Jax, Julia, and possibly writing a parser for SBML. Specifically, it will be straightforward to import the model in a trainable format (e.g., while none of the ONNX-runtime packages in Julia currently allows this, strongly indicating parsing ONNX is non-trivial). Regarding parameters, we can likely have one table (or similar) for each layer or network. Overall, as PEtab is for setting up parameter estimation problems we can then train, I think we should support a format that allows easy training (e.g. bookeeping of layers, parameters etc...).

NNEF can work similarly to ONNX. However, it is primarily a format for deployment. Exporting to it from other formats (e.g., PyTorch) can be tricky with functions like torch.flatten. Additionally, tool-specific PEtab parsing will not be trivial. For example, a parser must detect that:

matmul1 = matmul(external1, variable5);
add1 = add(variable1, matmul1);
tanh1 = tanh(add1);

corresponds to a feedforward layer with tanh activation. This will likely become even more complex with more sophisticated architectures. Optionally, we might support onnx for model shipping/deployment (I think the reason Biomodels have onnx support is because it is very easy to run onnx-model (e.g. just need onnx-runtime), but training them which we are interested in is a different issue).

We can potentially also setup a toy example with both approaches to see what is most convenient.

@m-philipps @dilpath

dilpath commented 1 month ago

So it would be YAML, plus e.g. HDF5 to store the trained matrices, I guess? Would be fine for me, as long as we ensure that the format can be converted to NNEF/ONNX, so that we can simply switch to NNEF/ONNX in future if we change our minds. i.e. if we explicitly define all layers/operations in terms of ONNX/NNEF (as much as possible), e.g. linear(input, weight=matrix1, bias=vector1, activation=tanh, id=linear1) is defined in terms of NNEF explictly

matmul1 = matmul(input, matrix1)
add1 = add(matmul1, vector1)
tanh1 = tanh(add1)
linear1 = tanh1

and we can convert from our custom HDF5 to ONNX/NNEF, then I see no issue continuing like this for now. Then the YAML/HDF5 format would just be a user-friendly layer on top of ONNX/NNEF for Python users, but the actual format for Julia users.

dilpath commented 1 month ago

I'm happy to try it out with the small library that was developed for the non-negative UDE work we did in AMICI/pyPESTO. I could setup a private repo in the PEtab-dev Github org, for us to hack on it together if you like, which could end up becoming public. Or do you want to do this independently of PEtab-dev?

Just FYI, I probably can't work on it too much until later in June.

FFroehlich commented 1 month ago

If we go down that route, we should consider how much easier it is to set up a new format plus write a parser than improving the parsers that we already have for ONNX/NNEF? Or are there really things that we need that cannot be expressed in NNEF/ONNX?

During Harmony, we discussed not making the spec too much reliant on the underlying format for the neural network, which I still think is a good way of moving forward with this.

dilpath commented 1 month ago

how much easier it is to set up a new format plus write a parser than improving the parsers that we already have for ONNX/NNEF? Or are there really things that we need that cannot be expressed in NNEF/ONNX?

Agreed, if it's easy to infer layers from the sequences of operations or computational graphs defined in ONNX/NNEF, then fine to extend pre-existing parsers.

Or maybe I misunderstood, and you were rather suggesting to work on the level of matmul/add/etc. in Julia, rather than working on the level of linear? Either works I guess

During Harmony, we discussed not making the spec too much reliant on the underlying format for the neural network, which I still think is a good way of moving forward with this.

Also agreed, if we assume that ANN formats support IDs for matrices/vectors, then the PEtab parameter table can simply refer to these IDs or an ID for the whole ANN. Since this is supported by both ONNX and NNEF, we could proceed in the proposal with this assumption.

sebapersson commented 1 month ago

how much easier it is to set up a new format plus write a parser than improving the parsers that we already have for ONNX/NNEF? Or are there really things that we need that cannot be expressed in NNEF/ONNX?

It seems much easier to work with YAML syntax in Julia. From one the authors of one of the ONNX Julia importers, importing to Flux/Lux.Chain is quite tricky, as discussed here. This import would be necessary for a trainable format. Additionally, when I tried importing an ONNX model to PyTorch, the model was not in a trainable format but rather in inference mode.

Working at the linear layer level would be preferred, as it makes it easier to manage parameters and allow for features like adding regularization. Defining layers in PyTorch syntax within YAML would streamline the process and ensure compatibility across various tools and frameworks.

and we can convert from our custom HDF5 to ONNX/NNEF, then I see no issue continuing like this for now.

As long as the format we develop is easy to integrate with ML packages (Equinox, Lux.jl), parsing the network as a PyTorch network should be straightforward, especially since Equinox borrows much of its syntax from PyTorch. Once we can parse a model into PyTorch, we can export it to ONNX. Moreover, if we have code for inserting trainable networks into models, inserting executable functions (like an ONNX-runtime model) should not pose any problems for model deployment or for uploading the model to repositories like BioModels, or if we want to insert a trained ML model into an existing model.

I'm happy to try it out with the small library that was developed for the non-negative UDE work we did in AMICI/pyPESTO. I could setup a private repo in the PEtab-dev Github org, for us to hack on it together if you like, which could end up becoming public. Or do you want to do this independently of PEtab-dev?

We can setup in PEtab-dev. And no worries, I need to update PEtab.jl before it can support UDEs :) And refer to ids for matrices and vectors for parameters sounds good.

Moving forward, I suggest we try using the YAML syntax. Additionally, we can experiment with ONNX (my feeling is ONNX has larger support than NNEF) for at least a feed-forward network to keep our options open.

dilpath commented 1 month ago

Makes sense to me!

we can export it to ONNX

Just to clarify, personally I don't really care if I ever touch an ONNX/NNEF model. My point was rather that, if we define our YAML things in terms of pre-existing ONNX/NNEF operations, then we can put less effort into the spec, and naturally guarantee compatibility/convertibility (at least as much as possible) for a possible future switch.

I've setup a private GitHub repo and invited everyone in this thread there. Anyone reading and wanting to join, feel free to reply here. I think we can make it public any time, if everyone agrees.

sebapersson commented 1 month ago

Just to clarify, personally I don't really care if I ever touch an ONNX/NNEF model. My point was rather that, if we define our YAML things in terms of pre-existing ONNX/NNEF operations, then we can put less effort into the spec, and naturally guarantee compatibility/convertibility (at least as much as possible) for a possible future switch.

Good point. Can take this into consideration in the spec (will rework the spec a bit next week)

FFroehlich commented 1 month ago

how much easier it is to set up a new format plus write a parser than improving the parsers that we already have for ONNX/NNEF? Or are there really things that we need that cannot be expressed in NNEF/ONNX?

It seems much easier to work with YAML syntax in Julia. From one the authors of one of the ONNX Julia importers, importing to Flux/Lux.Chain is quite tricky, as discussed here. This import would be necessary for a trainable format. Additionally, when I tried importing an ONNX model to PyTorch, the model was not in a trainable format but rather in inference mode.

Working at the linear layer level would be preferred, as it makes it easier to manage parameters and allow for features like adding regularization. Defining layers in PyTorch syntax within YAML would streamline the process and ensure compatibility across various tools and frameworks.

I don't get it. Looking at https://github.com/FluxML/ONNX.jl/blob/9cd42b9d0bb6b311978368aeea6b64f72c908d49/src/load.jl#L274 looks like it already supports deserialisation of ONNX to a graph, which to me suggests that the parsing itself is not the problem. If there are graph structures that can't be mapped to Lux.Chain or certain primitives that are not supported in Lux, I don't think that's a dealbreaker. The majority of simulators also doesn't support the full SBML spec after all.

sebapersson commented 1 month ago

I would say the problem is that ONNX.jl (like onnx-runtime) rather parses the network into an executable. Overall, the imported network struct in Julia is for a feed-forward network:

Tape{ONNX.ONNXCtx}
  inp %1::Matrix{Float64}
  const %2 = Float32[-0.05379379, 0.4007105, 0.30614695, 0.06480052, -0.055169716]::Vector{Float32}
  const %3 = Float32[0.25679538, -0.41509518, -0.33772185, 0.3479544, 0.1222622]::Vector{Float32}
  const %4 = Float32[0.17522529, -0.4100246, -0.1387868, 0.100876085, 0.26908317]::Vector{Float32}
  const %5 = Float32[0.20539606, 0.34619755]::Vector{Float32}
  const %6 = Float32[0.3488896 -0.59803706; 0.6616885 0.3262208; -0.5895992 0.36083856; 0.6229092 -0.3879795; -0.2545523 -0.1497077]::Matrix{Float32}
  const %7 = Float32[-0.12148299 -0.4373391 0.16868982 -0.015025938 -0.13648051; 0.043639082 0.22581898 0.00806083 0.3595815 -0.18335937; -0.19974993 0.3554513 0.14617403 0.2616244 -0.13416076; 0.31926423 -0.15863661 -0.22527254 -0.039938644 0.3005534; -0.17906253 -0.43455914 0.40854213 -0.16950837 0.40028062]::Matrix{Float32}
  const %8 = Float32[0.406842 0.0719507 0.27994198 -0.13051267 -0.029453982; -0.24535996 0.087244794 -0.14003834 -0.29761684 0.23366688; -0.10837671 0.4379476 -0.24381466 0.03290359 -0.32990557; 0.4248098 0.05269706 0.15767401 -0.3655502 0.35405073; -0.23889226 0.24450035 -0.24893427 0.41901243 0.33659452]::Matrix{Float32}
  const %9 = Float32[0.0007414635 -0.14650126 -0.39309075 0.18925633 0.1694107; -0.24061605 -0.39639646 0.20880647 0.316631 0.17056699]::Matrix{Float32}
  %10 = *(%6, %1)::Matrix{Float64} 
  %11 = add(%2, %10)::Matrix{Float64} 
  %12 = tanh(%11)::Matrix{Float64} 
  %13 = *(%7, %12)::Matrix{Float64} 
  %14 = add(%3, %13)::Matrix{Float64} 
  %15 = tanh(%14)::Matrix{Float64} 
  %16 = *(%8, %15)::Matrix{Float64} 
  %17 = add(%4, %16)::Matrix{Float64} 
  %18 = tanh(%17)::Matrix{Float64} 
  %19 = *(%9, %18)::Matrix{Float64} 
  %20 = add(%5, %19)::Matrix{Float64} 

Yes, for a feed-forward network, a structure like this can be parsed into a proper Flux/Lux network. However, for more complex architectures (e.g., convolutional), ONNX.jl already fails. As seen here, having a general Flux/Lux importer is expected to be tricky. We can try improve existing importers, but, I feel that trying to fix an importer for a flexible format primarily for deployment (not training) will be like jumping into a black-hole.

Meanwhile, the YAML syntax, as it already uses building blocks in most ML packages, makes it easier to support different types of layers beyond just feed-forward networks. It will be easier to support various layers without extensive parsing work. And its fair that most SBML importers do not support many features, but I think having a spec that makes it easy to support the most relevant features for easily setting up trainable models would be beneficial.

FFroehlich commented 1 month ago

I would say the problem is that ONNX.jl (like onnx-runtime) rather parses the network into an executable. Overall, the imported network struct in Julia is for a feed-forward network:

Tape{ONNX.ONNXCtx}
  inp %1::Matrix{Float64}
  const %2 = Float32[-0.05379379, 0.4007105, 0.30614695, 0.06480052, -0.055169716]::Vector{Float32}
  const %3 = Float32[0.25679538, -0.41509518, -0.33772185, 0.3479544, 0.1222622]::Vector{Float32}
  const %4 = Float32[0.17522529, -0.4100246, -0.1387868, 0.100876085, 0.26908317]::Vector{Float32}
  const %5 = Float32[0.20539606, 0.34619755]::Vector{Float32}
  const %6 = Float32[0.3488896 -0.59803706; 0.6616885 0.3262208; -0.5895992 0.36083856; 0.6229092 -0.3879795; -0.2545523 -0.1497077]::Matrix{Float32}
  const %7 = Float32[-0.12148299 -0.4373391 0.16868982 -0.015025938 -0.13648051; 0.043639082 0.22581898 0.00806083 0.3595815 -0.18335937; -0.19974993 0.3554513 0.14617403 0.2616244 -0.13416076; 0.31926423 -0.15863661 -0.22527254 -0.039938644 0.3005534; -0.17906253 -0.43455914 0.40854213 -0.16950837 0.40028062]::Matrix{Float32}
  const %8 = Float32[0.406842 0.0719507 0.27994198 -0.13051267 -0.029453982; -0.24535996 0.087244794 -0.14003834 -0.29761684 0.23366688; -0.10837671 0.4379476 -0.24381466 0.03290359 -0.32990557; 0.4248098 0.05269706 0.15767401 -0.3655502 0.35405073; -0.23889226 0.24450035 -0.24893427 0.41901243 0.33659452]::Matrix{Float32}
  const %9 = Float32[0.0007414635 -0.14650126 -0.39309075 0.18925633 0.1694107; -0.24061605 -0.39639646 0.20880647 0.316631 0.17056699]::Matrix{Float32}
  %10 = *(%6, %1)::Matrix{Float64} 
  %11 = add(%2, %10)::Matrix{Float64} 
  %12 = tanh(%11)::Matrix{Float64} 
  %13 = *(%7, %12)::Matrix{Float64} 
  %14 = add(%3, %13)::Matrix{Float64} 
  %15 = tanh(%14)::Matrix{Float64} 
  %16 = *(%8, %15)::Matrix{Float64} 
  %17 = add(%4, %16)::Matrix{Float64} 
  %18 = tanh(%17)::Matrix{Float64} 
  %19 = *(%9, %18)::Matrix{Float64} 
  %20 = add(%5, %19)::Matrix{Float64} 

Yes, for a feed-forward network, a structure like this can be parsed into a proper Flux/Lux network. However, for more complex architectures (e.g., convolutional), ONNX.jl already fails. As seen here, having a general Flux/Lux importer is expected to be tricky. We can try improve existing importers, but, I feel that trying to fix an importer for a flexible format primarily for deployment (not training) will be like jumping into a black-hole.

Meanwhile, the YAML syntax, as it already uses building blocks in most ML packages, makes it easier to support different types of layers beyond just feed-forward networks. It will be easier to support various layers without extensive parsing work. And its fair that most SBML importers do not support many features, but I think having a spec that makes it easy to support the most relevant features for easily setting up trainable models would be beneficial.

Okay, looking at this, it's evident that, as ONNX/NNEF operate at the level of primitives rather than layers, which means that conversion to Flux/Lux is not going to work since you would have to infer the layer abstraction on top of the encoded graph.

However, why do we actually need to convert to Lux/Flux? To me it looks like both of them use NNLib under the hood anyways. NNlib exposes gradients of primitives for autodiff (https://github.com/FluxML/NNlib.jl), so I don't see why the models wouldn't be trainable (except for the fact that numerical values are hardcoded, but that seems like something that could be easily fixed). Is the conversion to Umlaut.Tape an issue?

My impression from harmony was that there was really a strong opposition against inventing our own format for the neural network part. I still think it's not a good idea in terms of expected workload, community buy-in and access to other tools in the ecosystems (see e.g., https://github.com/lutzroeder/netron).

sebapersson commented 1 month ago

However, why do we actually need to convert to Lux/Flux? To me it looks like both of them use NNLib under the hood anyways. NNlib exposes gradients of primitives for autodiff (https://github.com/FluxML/NNlib.jl), so I don't see why the models wouldn't be trainable (except for the fact that numerical values are hardcoded, but that seems like something that could be easily fixed). Is the conversion to Umlaut.Tape an issue?

I have a feeling that with the tape representation we cannot really track network parameters, and differentiate with respect to parameter layers. But, I will dig a bit into this and then come back latest tomorrow.

FFroehlich commented 1 month ago

However, why do we actually need to convert to Lux/Flux? To me it looks like both of them use NNLib under the hood anyways. NNlib exposes gradients of primitives for autodiff (https://github.com/FluxML/NNlib.jl), so I don't see why the models wouldn't be trainable (except for the fact that numerical values are hardcoded, but that seems like something that could be easily fixed). Is the conversion to Umlaut.Tape an issue?

I have a feeling that with the tape representation we cannot really track network parameters, and differentiate with respect to parameter layers. But, I will dig a bit into this and then come back latest tomorrow.

Would be quite a non sequitur if a user "dfdx" writes a package that is not compatible with autodiff.

sebapersson commented 1 month ago

Would be quite a non sequitur if a user "dfdx" writes a package that is not compatible with autodiff.

Luckily the package is compatible with autodiff :) however only with respect to the inputs. For example, the following runs:

using ONNX, ForwardDiff, Zygote, NNlib, Umlaut

path = "NN1.onnx"
foo = ONNX.load(path, randn(2, 1))

function f(x)
    _x = reshape(x, (2, 1))
    out = play!(foo, _x)
    return sum(out)
end

x = randn(2)
g1 = ForwardDiff.gradient(f, x)
2-element Vector{Float64}:
 -0.030401147080500554
 -0.016654266496356394

However, since ONNX is designed for inference, the input in our case would be the network input (not parameters). Even though we could supply the parameters as input, we would need to track them (e.g. which layer) for mapping. Additionally, Umlaut.jl is for the Yota AD library, which is not well-suited for SciML in Julia (note one of the main AD pacakges). I do not expect a much better situation in Jax with handling imported ONNX graphs. Overall, since ONNX operates on the level of primitives and is for inference, parsing the code into a usable network will not be trivial and will likely be limiting if we want to go beyond feedforward networks.

Overall, as we want to set up training problems, keep track of parameters (e.g., for regularization), and take advantage of optimized ML packages, I think we need a format that can be parsed into suitable ML training packages. These inference formats will likely make this hard.

My impression from harmony was that there was really a strong opposition against inventing our own format for the neural network part. I still think it's not a good idea in terms of expected workload, community buy-in and access to other tools in the ecosystems (see e.g., https://github.com/lutzroeder/netron).

I agree. To avoid inventing anything new, I propose we stay very close to PyTorch syntax for setting up layers. As PyTorch is probably the most used package for ML right now, it in a sense is a standard. We could even provide a PyTorch importer for the networks, allowing users to easily export the network to their desired format later, and for depositing models, allow ONNX input.

FFroehlich commented 1 month ago

Would be quite a non sequitur if a user "dfdx" writes a package that is not compatible with autodiff.

Luckily the package is compatible with autodiff :) however only with respect to the inputs. For example, the following runs:

using ONNX, ForwardDiff, Zygote, NNlib, Umlaut

path = "NN1.onnx"
foo = ONNX.load(path, randn(2, 1))

function f(x)
    _x = reshape(x, (2, 1))
    out = play!(foo, _x)
    return sum(out)
end

x = randn(2)
g1 = ForwardDiff.gradient(f, x)
2-element Vector{Float64}:
 -0.030401147080500554
 -0.016654266496356394

However, since ONNX is designed for inference, the input in our case would be the network input (not parameters). Even though we could supply the parameters as input, we would need to track them (e.g. which layer) for mapping. Additionally, Umlaut.jl is for the Yota AD library, which is not well-suited for SciML in Julia (note one of the main AD pacakges). I do not expect a much better situation in Jax with handling imported ONNX graphs. Overall, since ONNX operates on the level of primitives and is for inference, parsing the code into a usable network will not be trivial and will likely be limiting if we want to go beyond feedforward networks.

Overall, as we want to set up training problems, keep track of parameters (e.g., for regularization), and take advantage of optimized ML packages, I think we need a format that can be parsed into suitable ML training packages. These inference formats will likely make this hard.

At the same time sticking to a predefined layer structure would also be quite limiting as it substantially limits the expressiveness for the models. We already have at least one project in the lab that would not fit into such a framework.

Even though ONNX has a focus on inference, I don't think this is a limitation of the format itself. https://github.com/FluxML/ONNX.jl/blob/9cd42b9d0bb6b311978368aeea6b64f72c908d49/src/load.jl#L274 would be a good starting point to implement any taping or other nn structure.

JAX does not have the same issue since it exposes every array to autodiff.

My impression from harmony was that there was really a strong opposition against inventing our own format for the neural network part. I still think it's not a good idea in terms of expected workload, community buy-in and access to other tools in the ecosystems (see e.g., https://github.com/lutzroeder/netron).

I agree. To avoid inventing anything new, I propose we stay very close to PyTorch syntax for setting up layers. As PyTorch is probably the most used package for ML right now, it in a sense is a standard. We could even provide a PyTorch importer for the networks, allowing users to easily export the network to their desired format later, and for depositing models, allow ONNX input.

In that case, what about using TorchScript?

sebapersson commented 1 month ago

In that case, what about using TorchScript?

From a quick glance, TorchScript seem flexible, and feasiable to parse. Would it be flexible enough to fit into the format of the projects in your group?

JAX does not have the same issue since it exposes every array to autodiff.

Fair point, but still probably worthwhile to do a bit of prototyping to ensure there are no catch here (e.g. things works with diffrax).

FFroehlich commented 1 month ago

In that case, what about using TorchScript?

From a quick glance, TorchScript seem flexible, and feasiable to parse. Would it be flexible enough to fit into the format of the projects in your group?

Would require some testing, but TorchScript seems pretty flexible as it seems to have representations at the layer and primitive level. But again, I think it would be good for the spec to be format agnostic such that we say NNEF/ONNX/TorchScript are all possible.

JAX does not have the same issue since it exposes every array to autodiff.

Fair point, but still probably worthwhile to do a bit of prototyping to ensure there are no catch here (e.g. things works with diffrax).

Agreed.

sebapersson commented 1 month ago

Would require some testing, but TorchScript seems pretty flexible as it seems to have representations at the layer and primitive level. But again, I think it would be good for the spec to be format agnostic such that we say NNEF/ONNX/TorchScript are all possible.

I agree. Then I think in the spec we can allow NNEF, ONNX and TorchScript, and it comes down to implementation which format(s) are supported. If this sounds good @dilpath and @m-philipps I can move on to updating the spec.

dilpath commented 1 month ago

Sounds good to me! TorchScript looks similar to NNEF IMO, no issue from my side to support both and ONNX.

FFroehlich commented 2 weeks ago

Looks like Biomodels has also picked ONNX as goto format: https://x.com/biomodels/status/1792927084236443945

sebapersson commented 2 weeks ago

Looks like Biomodels has also picked ONNX as goto format: https://x.com/biomodels/status/1792927084236443945

Which makes sense as onnx-runtime runs on must system, but a good indication that we should support at least deployment via onnx.