Consider using declarative language for model configs

m-novikov commented 4 years ago

My assumption regarding use case for ilastik is following: We have non-programmer users that want to use (run predictions) and finetune (training) predictions on models available in model zoo. Because of this I assume priorities are the following:

Stability (model works well on all ilastik versions)
Reliable (once a model is available for ilastik users, it is maintained and supported) This also puts some limitations on the flexibility of the framework as extensive flexibility may lead to unstable and unreliable plugins(models in our case).

While using python is nice and convenient for describing configs files it has certain drawbacks.

It is hard to reason about "final" config state
Complexity of evaluation and maintenance
Adding features is problematic, because every time we can possibly change something in our framework it will break models already present in the zoo. A good example of scriptable configs pitfalls is setup.py where every project has it's own hacks and workarounds without a uniform way of doing things.

On the other had a declarative format of config files such as json or yaml. Allows us to limit excessive complexity of certain solutions and ship to ilastik users only thoroughly tested and simple enough models. Also, it allows us to start small with a limited number of parameters (config keys) and then generalize from that adding features as the need arises.

FynnBe commented 4 years ago

It is hard to reason about "final" config state

Could you elaborate this point? Is it with respect to adding/deprecating 'config keys'?

m-novikov commented 4 years ago

Could you elaborate this point? Is it with respect to adding/deprecating 'config keys'?

We can not tell how the model looks and which callbacks are registered until we execute the model file. If we have declarative config we always can be reasonably sure that training withing our model engine only does training. But with callbacks, we should expect injection and state modification on any stage.

My concern that it would look like high-level framework on top of high-level frameworks (generalizing Keras, Tensorflow, Pytorch). Which is a hard thing to do.

FynnBe commented 4 years ago

We can not tell how the model looks and which callbacks are registered until we execute the model file. If we have declarative config we always can be reasonably sure that training withing our model engine only does training. But with callbacks, we should expect injection and state modification on any stage

the execution of the model file should be 'light' and not invoke any computation. It should be equivalent to loading a yaml file, etc. with the advantage of exactly specifying what is needed---no interpretation of strings and searching for their corresponding python objects. We do not know everything about how the model 'looks' anyway, because we are not able to do so even with a simple, plain torch.nn.Module model in simplest of training loops. Callbacks can change the model state, so does the optimizer.

My concern that it would look like high-level framework on top of high-level frameworks (generalizing Keras, Tensorflow, Pytorch). Which is a hard thing to do.

The highest level (the ModelZooEntry class at the moment), might serve as an API across frameworks in the future. However, the rest is PyTorch specific (and based on pytorch.ignite) If this model proves useful we might transfer it to Tensorflow. Both, PyTorch and TensorFlow are low-level from the perspective of a model zoo.

If the final models, as well as the training and logging procedures, are too high level and not customizable, every model in the model zoo will be a limited version its original development.

Limiting the number of changeable parameters by the end-users is a different topic than enabling developing users to exactly describe the desired behavior, and possibly even train their new networks within this framework efficiently in the first place.

constantinpape commented 4 years ago

I think @m-novikov raises some very valid points:

We need to decide on a good trade-off between maintainability and customization. The most maintainable solution would be to use a declarative configuration for everything but the model (i.e. loss, optimizer etc.) and only allow for pytorch classes (which would make string parsing trivial). This is very restrictive, but might be a good starting point.

We should also keep in mind that the configuration specs can evolve over time, as long as we keep supporting the initial spec. So starting with a declarative specification that translates to pytorch (and maybe an additional tiktorch-core module) and then introducing a code based spec would be an option (maybe in some experimental mode first).

We do not know everything about how the model 'looks' anyway, because we are not able to do so even with a simple, plain torch.nn.Module model in simplest of training loops. Callbacks can change the model state, so does the optimizer.

Yes, but not supporting custom losses, optimizers, callbacks, etc. already reduces the complexity a lot.

m-novikov commented 4 years ago

I think the main problem here is a difference between user and developer experience. I think that models that are available through ilastik interface are targeted for users and do not require heavy customization. They should be built/adapted for ilastik usage. For heavily customized models following questions arise:

What happens if they don't work?
How do we reason about model behavior?
What if model pipeline broken in some way?
How do we guarantee reproducibility?

I see two versions here:

User provides a model that may work in ilastik data with custom code, we run it once, check passes, and now users are able to load and execute model within current ilastik version. If it breaks it breaks.
We have set of tested loss functions and data augmentations that are tested with ilastik data format and we use it to assemble model pipeline. We can version and be sure all these pipelines work in multiple ilastik/tiktorch versions. It allows us to have environment and seamless upgrades of libraries. Smaller the API surface the less chances of some obscure breakage.

FynnBe commented 4 years ago

A suggestion for a compromise: What if we use the proposed code based config and add an 'approved' flag. When using a model in ilastik, etc. we require it to be 'approved'. What would that mean? if 'approved' all settings must be in a restricted set, e.g. the optimizer must be one of the available ones in torch.optim, etc... This way we can have the same framework for the minimum, safe, easy-to-maintain part, but seamless integration with more complex models and active development. This would also allow us to shift what is 'approved' gradually and facilitate the transition of 'experimental' setups to 'approved' ones by providing accurate feedback of what exactly is not approved yet.

emilmelnikov commented 4 years ago

The main argument against Python-based configs is the rule of least power.

Maybe we need a concrete example of a use case where static configuration would be insufficient? Otherwise, it's very hard to reason about this issue.

constantinpape commented 4 years ago

@FynnBe The 'approved' mode would basically be a declarative config, but would be expressed in python, instead of json or yaml. We would restrict customization by limiting the available dependencies to torch and a tiktorch core library (or libraries). This would be easy to implement in a json or yaml format as well.

Note that such a static / declarative configuration is not quite sufficient for the specification of the model itself (also regarding the question raised by @emilmelnikov). A pytorch model definition needs to be in python, capturing this in some json/yaml config is hopeless and would be a nightmare for any developer. We can still restrict the available dependencies to torch + core libraries. Also, the model definition could be 'offloaded' to the torch-hub hubconf.py format.

FynnBe commented 4 years ago

That is a very good point (that @m-novikov and @constantinpape made before, but you even have a rule for it). Makes a lot of sense, and as I do not like to break the rules it got me questioning my dear freely configurable python code. Maybe dataclasses can come to the rescue? ... something along the lines of yaml.dump(asdict(dataclass_)) or simply yaml.dump(dataclass_)? I'm looking into it atm... ...

There are a few keys we maybe do not want to save, like the path to some log dir.
So then what about python objects in the yaml file?

Here an example of the current ModelZooConfig as yaml:

!!python/object:__main__.ModelZooEntry
evaluator_callbacks: []
log_config: !!python/object:__main__.LogConfig
  checkpointer: !!python/object:ignite.handlers.checkpoint.ModelCheckpoint
    _atomic: true
    _dirname: C:/Users/fbeut/protozoo/checkpoints
    _fname_prefix: protozoo
    _iteration: 0
    _n_saved: 2
    _save_as_state_dict: true
    _save_interval: 2
    _saved: []
    _score_function: null
    _score_name: null
  dir: !!python/object/apply:pathlib.WindowsPath
  - C:\
  - Users
  - fbeut
  - protozoo
  model_n_saved: 2
  model_save_interval: 2
loss_config: !!python/object:__main__.LossConfig
  loss_class: !!python/name:torch.nn.modules.loss.MSELoss ''
  loss_kwargs: {}
model_config: !!python/object:__main__.ModelConfig
  create_loss_input: !!python/name:__main__.default_create_loss_input ''
  create_model_input: !!python/name:__main__.default_create_model_input ''
  model_class: !!python/name:torch.nn.modules.module.Module ''
  model_kwargs: {}
  pretrained_source: null
optimizer_config: !!python/object:__main__.OptimizerConfig
  optimizer_class: !!python/name:torch.optim.adam.Adam ''
  optimizer_kwargs: {}
predictor_callbacks: []
trainer_callbacks: []

FynnBe commented 4 years ago

Also, the model definition could be 'offloaded' to the torch-hub hubconf.py format.

I like where this is going

emilmelnikov commented 4 years ago

To be fair, one could make a case for programmable "configs": sometimes configuration could be very complicated because of the inherent complexity (cf. Ansible Playbooks or old Java EE XML configs) and the system could be simpler by allowing executable code. However, there should be some evidence of that complexity before one starts to introduce such powerful features.

constantinpape commented 4 years ago

@emilmelnikov I think for any developer, a pytorch model needs to be definable in python code as

import torch.nn as nn
class MyModel(nn.Module):
...

Anything else will be very unnatural for developers. We need to figure out how to deal with this in the config, and I think the hubconf.py would be a good starting point.

@FynnBe Maybe you have seen this already, but for safe dumping / loading of python objects in yaml, check out this: https://yaml.readthedocs.io/en/latest/dumpcls.html.

emilmelnikov commented 4 years ago

Here an example of the current ModelZooConfig as yaml:

This is probably an off-topic point, but I would highly discourage YAML because it is a very complicated format with a large spec and subtle fail points, especially when using non-trivial features.

m-novikov commented 4 years ago

Here an example of the current ModelZooConfig as yaml:

Problem with this example is it is an evaluated thing, that tries to mask itself as declarative. IMO we should be strict in which keys we allow in config.

I expect something along the following lines:

model.yaml

version: 1

data:
  input_axes: "cxyz" 
  input_channels: 2
  output_axes: "zxyc"
  output_channels: 3
  labels:
    - Background
    - Object 1
    - Object 2
  input_augmentation: ["NormalizeAug1", "Aug2"]  # All augmentations are only available at first from standard library, no custom classes here
  output_augmentation: ["OutAug1"]

model:
   # model class put inside model.py
   class: DUNet2D
   loss: "MSELoss"  # All losses at first only available from "standard" package, no custom loss classes
   optimizer: "Adam"  # All optimizers at first only available from "standard" package, no custom loss classes

model.py

import torch.nn as nn

class MyModel(nn.Module):
    def train(self, data, labels):
        ...

    def predict(self, data):
        ...

When we gather enough data about requirements for custom augmentations and losses then we introduce special syntax (e.g. loss: "!DiceLossWithQuirks") or even

loss:
    name: "!DiceLossWithQuirks"
    kwargs: {"w1": 0.0001}

and go from there.

But the first implementation should not involve any evaluation of custom classes IMO, with a model class exception.

constantinpape commented 4 years ago

@m-novikov Yes, I really like that suggestion. @FynnBe and me also discussed offline that it should be possible to serialize config classes into this format by overriding the staticmethods to_yaml and from_yaml.

constantinpape commented 4 years ago

We had a longer discussion about this now with @FynnBe, @m-novikov and @emilmelnikov after lunch. Just to summarize the main points:

We need to implement a first version of the config that supports basic options for training (say model, loss, optimizer and some basic transformations) without introducing too much complexity. It should be possible to extend this to support more advanced options like callbacks later. Both @m-novikov and @emilmelnikov voiced their strong opinion that it's easier to add functionality to the config then already design it with too many future options in mind.
@FynnBe and my main concern was that there should be a way to load and generate this config from some training library (not using the term framework here, because it has become a trigger word :P) so that we (and new members of the lab) can easily pre-train models for tiktorch. This library should also allow functionality that is not supported by the default config options, like specifying custom loss functions or callbacks.
Sidenote, we need to keep the larger model-zoo discussions in mind (see @oeway's project proposal I have posted in the group chat). We will know more about this once we talk to Wei next week.

My take-away from this discussion: We should separate the library discussion and config discussion more:

I think @FynnBe and me have tackled this from the wrong direction: we should define an initial config first (similar to the example @m-novikov posted above); ideally we would introduce some rfc-like specification for this + examples.
We can then write a small training library that parses and produces this config and allows more customization based on this spec (basically what we have started to do in this repo). This library would probably be based on ignite, which will also be used in tiktorch.

kreshuklab / protozoo

Consider using declarative language for model configs #4