open-simulation-platform / libcosim

OSP C++ co-simulation library
https://open-simulation-platform.github.io/libcosim
Mozilla Public License 2.0
54 stars 10 forks source link

Enable saving and restoring subsimulator state #765

Closed kyllingstad closed 1 month ago

kyllingstad commented 2 months ago

This is the first step towards closing #756. I've added functions corresponding to FMI 2.0's fmi2{Get,Set,Free}FMUstate() throughout the various layers of subsimulator interfaces and implementations:

The API is very similar to the one defined by FMI, except that it represents saved states by numeric indices rather than opaque pointers.

This led me to also remove the slave_state and state_guard stuff that was in slave_simulator.{hpp,cpp}. The overloading of the "state" terminology became confusing, and it seemed like it was a lot of code for very little gain. (It was supposed to be a check of correct API usage, but I can't remember it ever actually catching a bug.)

Note: This is all about saving states in memory, not about converting them to a process-independent format (serialisation, step 2 of #756) or saving to persistent storage (#757).

This PR also fixes #762. [Edit: Issue #762 has now been independently fixed (in the exact same way) by PR #766.]

kyllingstad commented 1 month ago

In case someone is in the middle of reviewing this, please note that I just pushed a commit which simplifies the changes to slave_simulator significantly.

In brief, rather than trying to save the entire internal state, which includes cached variable values and modifiers, we leave all state saving to the slave implementations and FMU code. Simulators which have active variable modifiers will simply refuse to save their state. I was never comfortable with my original attempt, because I wasn't sure that modifiers were properly saved. They're just std::function objects, which can point to any callable object, and there is no guarantee that copying one actually makes a deep copy of its entire state.

If it turns out that saving state which includes modifiers is a necessary feature, we can revisit it and make a proper implementation later.

kyllingstad commented 1 month ago

I changed the target branch for this from master to the new dev/state-persistence branch now. I am splitting the work on #756 and #757 over several PRs so it can be reviewed in manageable chunks, but I worry that I won't have the full picture of the changes needed before everything is done. Therefore, I'd like to keep it out of master until it's more mature. The dev/state-persistence branch can be merged into master when everything is done and we are happy with it.

restenb commented 1 month ago

In brief, rather than trying to save the entire internal state, which includes cached variable values and modifiers, we leave all state saving to the slave implementations and FMU code. Simulators which have active variable modifiers will simply refuse to save their state.

The effect of the variable modifier on the simulation will be seen in the saved states, so in principle this omits information necessary for e.g. fully transparent rewind / playback functionality for a whole simulation where these "user actions" must also be tracked. I guess there's also the question of what exactly can happen if the FMU is restored to a previous state, but our cached values aren't.

The data intended to be saved in a void* fmi2_FMU_state_t is by definition completely unknown by the caller - it's whatever the implementing FMU needs to restore it's state later? In other words there's no guarantee that the data there is suitable for certain uses, like serialization? Does that mean fmi2_capi_serialize_fmu_state should be used for serializable data instead?

I can't really find any information about how these functions are intended to be implemented by the FMU. Should we cooperate on some example FMUs? Is the Dahlquist FMU intended for testing the FMU state API?

Another question I have - say we want to save all state by default. Does this affect the current implementation - for example, do we need to start thinking about keeping a circular buffer of states for a configurable duration, for example? Is this to be done in execution::step, with saving & restoring state acting as a form of manipulator, or directly on each model instance within algorithm::do_step?

kyllingstad commented 1 month ago

Many good questions! I'll try to answer, but first, let me clarify something: This PR is not about serialisation at all. I split #765 into two tasks, where this PR addresses only the first one, namely saving the state in the FMU instance's internal memory. (I am almost done with the second task too, namely to enable serialisation of saved states for individual subsimulators. A PR on this is forthcoming. After that, I'll turn to #757, which is about saving, serialising, and persisting the entire simulation state to disk.)

That said, there is a use case for just being able to save states in memory too: It can be used by "re-stepping" algorithms, e.g. algorithms that roll back the last step(s) to a previous state if the error is too large, in order to repeat them with a smaller step size.

The effect of the variable modifier on the simulation will be seen in the saved states, so in principle this omits information necessary for e.g. fully transparent rewind / playback functionality for a whole simulation where these "user actions" must also be tracked.

I don't think the FMI state saving/serisalisation functions were designed for playback. Their goal is to save the precise simulation state at a certain point in time, so you can

  1. go back to that point within the current simulation run, e.g. for error control, or
  2. start another simulation from the exact same point later[^1]

We don't need information about what has happened in the past for either of these use cases, only the complete state of the system at present.

In other words, it doesn't matter if modifiers have been applied and then disabled before we save the state, nor whether we intend to apply some modifiers after we have restored the state again.

I guess there's also the question of what exactly can happen if the FMU is restored to a previous state, but our cached values aren't.

Yeah, that was a challenging point of this work. I have addressed it by calling set_variables() to transfer all cached values to the FMU instance before saving the state, and by calling get_variables() to repopulate the cache after restoring the state. That way, I basically hand over the responsibility for saving the variable values to the FMU (which a properly FMI-conforming FMU is supposed to handle correctly).

But that would not work as easily if modifiers were involved, so for now, I just want to forbid modifiers at the save point. We can revisit it later with a more sophisticated solution if the need arises, but for now, I think I'd like to gain some experience with the current, limited solution.

The data intended to be saved in a void* fmi2_FMU_state_t is by definition completely unknown by the caller - it's whatever the implementing FMU needs to restore it's state later? In other words there's no guarantee that the data there is suitable for certain uses, like serialization?

From the perspective of the co-simulation master, the fmi2_FMU_state_t pointer is completely opaque. It is just a handle that we use to refer to a state that has been saved internally in the FMU instance.

Does that mean fmi2_capi_serialize_fmu_state should be used for serializable data instead?

"In addition", not "instead". First you save the state to get an fmi2_FMU_state_t handle, then you pass that handle to fmi2_capi_serialize_fmu_state() to get a version of the state which is suitable for storage and later deserialisation. Working on it! :)

I can't really find any information about how these functions are intended to be implemented by the FMU. Should we cooperate on some example FMUs?

The FMI Library functions we use here are just wrappers over FMI functions. For example, fmi2_import_get_fmu_state() corresponds to the FMI 2.0 function fmi2GetFMUstate(), whose semantics are described in the FMI 2.0 spec.

Is the Dahlquist FMU intended for testing the FMU state API?

Exactly. And I'll be using it to test the serialisation API in my next PR.

Another question I have - say we want to save all state by default. Does this affect the current implementation - for example, do we need to start thinking about keeping a circular buffer of states for a configurable duration, for example? Is this to be done in execution::step, with saving & restoring state acting as a form of manipulator, or directly on each model instance within algorithm::do_step?

I'm not sure what the use case would be for saving all state by default, unless you mean for playback, and then I'll reiterate my statement that that's not what this feature is for. Saving the entire state in each time step would be enormously costly.

I haven't gotten to the point where I'm dealing with the full system and simulation yet, but here are my current ideas:

[^1]: This is the use case we have in OptiStress. There, we want to run a large number of simulations from the same starting point, e.g. in an optimisation loop. But for the sake of performance, we'd like to avoid repeating the "warm-up period" before the system reaches the steady state that we'll then perturb.

kyllingstad commented 1 month ago

I don't think the FMI state saving/serisalisation functions were designed for playback.

In fact, I don't think they can even conceivably be used for playback, because the internal state of each FMU instance is just exported as a binary blob, and in general you don't know anything about the format of its contents.