multiply-org / multiply-orchestration

This repository contains the code that is responsible for orchestrating the various components of the MULTIPLY platform. In particular, it takes care of providing the components with the input they require to run an inference in bio-physical parameters.
0 stars 1 forks source link

Can the state after an iteration be saved? #2

Closed TonioF closed 6 years ago

TonioF commented 6 years ago

This question is posed with regard to the inference engine: Is it possible to completely save the state after one iteration step to one or more files? Doing so would give the orchestrator the freedom to not only decompose sub-tasks for the inference engine spatially, but also temporally.

jgomezdans commented 6 years ago

There is some untested support for this, as all you'd be doing is setting up the LinearKalman class, and then calling assimilate_multiple_bands with the relevant parameters. But note that the system as it stands requires sequential operation (so to process time t+1, you need to have access to the output of time t, for which you would have required access to the output of time t-1 and so on).

TonioF commented 6 years ago

That sequential operation is exactly the point: When you say that the output from a time step is used, what is that exactly? The variable state? And the uncertainties? Would they need to be transformed somehow? Is there anything else that is popagated through time?

jgomezdans commented 6 years ago

In any probabilistic system, the information is stored in the pdf of the state. In KaFKA, we assume that that pdf is normal, and hence it is uniquely defined by the state vector and the associated covariance matrix (or inverse covariance matrix). So both vector and matrix are needed. Usually, the inference works in transformed units, but we don't use transformations on everything.

Propagation from time t to time t+1 is the combination of the state pdf (remember, mean + inverse covariance mtx) together with (i) any information coming from the prior and (ii) whatever changes in the state pdf a dynamic model produces. Note that having these two mechanisms is optional (you can have just a prior, as we had for the demo, or you can propagate using a dynamic model. Or you can have both. You can see a first pass implementation of this here).

So, broadly, when you move from one time step to another, you will update your original pdf at t, and this means that both the mean vector and the covariance matrix may change.

NPounder commented 6 years ago

So, I think you are agreeing (and it matches my understanding too) That all that needs to be saved from the previous time-step is the state vector (ie. the variable state) and the inverse covariance matrix (not just the diagonal (uncertainties) but the full matrix). We need to agree on whether this is in transformed space or not but probably it is best in transformed space.

Of course there may be a prior and we can have a dynamic model that needs to be specified but the only things that need to be saved from KaFKA, from the previous time-step, to to use in the next time step are the state vector from the previous time step and the inverse covariance matrix from the previous time step.

jgomezdans commented 6 years ago

Yes, the only information you need is the state vector and inverse covariance matrix. However, given the strong sequential nature of the processing, one might not want to dump these to disk and read them again (although the dump to disk to produce an output for the user. A convenient format for the user may or may not be the same as a convenient dump format for the engine).

There might be a use-case for storing the output as a starting point for a next state, but this would also require the timestep the state pdf refers to.

Also, I note that there's a data structure for this kind of behaviour in linear_kf. It's not mapped to disk, as the overhead of mapping a sparse matrix to something in disk is substantial, but the bones are broadly those. For full traceability, I guess I'd like some more metadata, but...

TonioF commented 6 years ago

I agree we definitely need to be aware that reading and writing causes a lot of overhead ... though it's mostly the reading, as we have to do the writing anyway. I think we have answered this question for now.