meyer-lab / mechanismEncoder

Developing patient-specific phosphoproteomic models using mechanistic autoencoders
4 stars 1 forks source link

September 24, 2020 call #4

Closed aarmey closed 3 years ago

aarmey commented 3 years ago

Agenda

aarmey commented 3 years ago

@FFroehlich if you have thoughts about the third bullet here (preferred framework), we don't necessarily need to discuss it during the call. I'm open to whichever you think would make it easiest for you to pass gradients / sensitivities.

FFroehlich commented 3 years ago

@FFroehlich if you have thoughts about the third bullet here (preferred framework), we don't necessarily need to discuss it during the call. I'm open to whichever you think would make it easiest for you to pass gradients / sensitivities.

No big preference, but for Theano there already is a working implementation that uses gradients/sensitivities from my pipeline (https://github.com/ICB-DCM/pyPESTO/blob/cbec7ee1343b4d66f86eb26e1d77cf77571184a4/pypesto/sample/pymc3.py#L65), so if there aren't any other advantages/disadvantages I would probably go with that.

aarmey commented 3 years ago

I'm quite familiar with Theano too, so let's go with that. I think the only potential downside is that it's no longer developed, but I don't anticipate that causing problems.

FFroehlich commented 3 years ago

I'm quite familiar with Theano too, so let's go with that. I think the only potential downside is that it's no longer developed, but I don't anticipate that causing problems.

Sounds good, tensorflow is the pymc4 backend and pypesto/amici will need an interface for it sooner or late, so that might also be a good option if we run into issues with theano.

Whats currently unclear to me is whether backprop through the autoencoder can be done outside of the ODE solve (I think it should be possible) and how the class that is currently implemented in pypesto would have to be updated.

aarmey commented 3 years ago

If you can provide the Jacobian of the outputs with respect to the inputs, that's enough to do the backprop. I used this with pymc3 before. Here's the Op we wrote (apologies it's a bit of a mess):

https://github.com/meyer-lab/gc-cytokines/blob/master/ckine/differencing_op.py

However, I think it's more efficient to define the Jacobian-vector product, which is what the backwards pass actually uses, because this would only require one adjoint solve:

http://deeplearning.net/software/theano/tutorial/gradients.html#r-operator

I haven't tried to implement this before, though.

aarmey commented 3 years ago

These are the best notes I'm aware of about this:

https://diffeq.sciml.ai/latest/extras/sensitivity_math/#sensitivity_math https://mitmath.github.io/18337/lecture10/estimation_identification

FFroehlich commented 3 years ago

If you can provide the Jacobian of the outputs with respect to the inputs, that's enough to do the backprop. I used this with pymc3 before. Here's the Op we wrote (apologies it's a bit of a mess):

https://github.com/meyer-lab/gc-cytokines/blob/master/ckine/differencing_op.py

However, I think it's more efficient to define the Jacobian-vector product, which is what the backwards pass actually uses, because this would only require one adjoint solve:

http://deeplearning.net/software/theano/tutorial/gradients.html#r-operator

I haven't tried to implement this before, though.

Thanks, thats effectively just the sensitivities, so we should be good with what is currently there!

Jacobian-vector product vs full Jacobian should only become relevant if we are looking at Newton-type optimizers, right? For a scalar output we anyways only require a single adjoint solve and so computing the Jacobian-vector shouldn't make any difference since the cost for solving the scalar integrals for every input is negligible. Vector-valued outputs become an issue with time-resolved data since each timepoint will requires an adjoint solve (for the loss function you can superimpose solutions since they are linear, so you only ever need one adjoint solve).

The code adjoint hessian vector product) is in the C++ part of AMICI, but hasn't been ported to python yet, so if we find thats necessary it shouldn't be too much work to get it running. Wasn't aware that theano etc even has functions for Newton-type optimization.

aarmey commented 3 years ago

Ah—you're right that the number of adjoint solves depends on the number of time points. I had that confused. In that case R_op should be the same as using grad for backprop.

I don't think we need to worry about higher-order autodiff; L-BGFS is usually as good as Newton unless you want really high accuracy.

sgosline commented 3 years ago

Can we also discuss a data platform? I just started an OSF project, but can obviously use Synapse - it would be nice to not be schlepping data/results back and forth without the github file limit.

aarmey commented 3 years ago

I like the synapse client, so we can go with that unless you prefer OSF?