Serialize trajectories of States in their exact and complete form

aseth1 commented 10 years ago

Includes discrete state variables but not cached variables. A States trajectory must be tied to a model. It shouldn't be easily human readable (e.g. zipped) and not bound by model name but by other unique key. It would also be handy if it maintained the version of OpenSim used to generate the States as well. Party addresses issue raised in #104 for robust post-hoc analysis of a simulation result.

chrisdembia commented 10 years ago

It could use a hash of the model file, but that would prevent someone doing things like adding comments to their model file, or adding components that might not affect that state. I currently use HDF5 to store all my data: https://en.wikipedia.org/wiki/Hierarchical_Data_Format. The benefit of using a format like this is that it has API's in multiple languages. So OpenSim could write an HDF5 file and then a user could read it in MATLAB.

chrisdembia commented 10 years ago

A quote from that wikipedia article:

Because it uses B-trees to index table objects, HDF5 works well for time series data such as stock price series, network monitoring data, and 3D meteorological data. The bulk of the data goes into straightforward arrays (the table objects) that can be accessed much more quickly than the rows of a SQL database, but B-Tree access is available for non-array data. The HDF5 data storage mechanism can be simpler and faster than an SQL star schema.

sherm1 commented 10 years ago

We might want to be somewhat lax about the association between Models and serialized States. It's nice to be able to use States with compatible but not-quite-exactly-the-same Models. Changes to any trivia like comments, names of things, extra visualization geometry, author lists, etc. shouldn't mean that precious trajectories become useless. We can easily make some checks on sizes of things and prevent nonsense usage.

If we serialize to XML or some other ascii format, it will be possible to do some minor editing to extend the life of some trajectories, and it will be possible for people to generate states using other programs like Matlab. We should make sure that's possible I think.

chrisdembia commented 10 years ago

Is the motivation to have a binary format the size of the file?

sherm1 commented 10 years ago

Is the motivation to have a binary format the size of the file?

I think so. Ajay and I had discussed using a zip of an ascii format. That has the advantage of being editable but small, and also allows us to zip up a collection of files into what would look like a single OpenSim model file. That is how Microsoft Word works -- .docx is actually a zip of an elaborate directory structure with numerous subfiles.

Of course a zipped ascii file is not much good for fast lookup.

tkuchida commented 10 years ago

MapleSim works similarly. They zip the model along with Maple worksheets, simulation results, images, etc. into an .msim file.

aseth1 commented 10 years ago

We might want to be somewhat lax about the association between Models and serialized States. It's nice to be able to use States with compatible but not-quite-exactly-the-same Models. Changes to any trivia like comments, names of things, extra visualization geometry, author lists, etc. shouldn't mean that precious trajectories become useless. We can easily make some checks on sizes of things and prevent nonsense usage.

I disagree here @sherm1. Calling things states when there is no guarantee that they satisfy the constraints, dynamics, etc... is very misleading. We then always have to waste time and introduce ambiguity to verify are these "states" really States and how do we make them proper States? The point of having binary like states would be to guarantee that a trajectory was generated from a given model under a specific generation scheme (forward integration, etc...) Users should never modify States! They should be sacred and remain untouched. Being able to access them in MATLAB or other software is OK but if modified in anyway the checksum will fail.

To handle your scenario we should make it straightforward to go from from reported outputs (easy to edit storage files that we generate already) to States via a generator. The input of the generator making it explicit that you are giving some "guess" of what the states should be. One can still have reports (tab delim files) of q's u's and muscle activations, fiber lengths for processing and editing, but the underlying States trajectory should be the exact archive of a specific simulation and never be editable in my opinion.

sherm1 commented 10 years ago

There may be roles for both kinds of serialized states. I don't think they are mutually exclusive. At the bottom level we need a way to serialize a single State in such a manner that we can recreate the binary State object from the serialization. That will be useful no matter what. Then we can build various policies on top of that, including an inviolate checksum when that's appropriate.

Serializing a single SimTK::State object doesn't actually require a System (or Model) object at all. Today you can already make a copy of a State without a System present (and that is very useful!).

jenhicks commented 10 years ago

For 4.0: Include a mechanism to output full state trajectory. Need to assess the degree to which this gets exposed to GUI users (based on other progress on model components and reporting of outputs).

chrisdembia commented 10 years ago

Say I wanted to get the total work done by my actuator over a gait cycle. It'd be really neat if I could do something like

stateTraj = opensim.StateTrajectory("my_serialized_state_trajectory.xml")
integrator = opensim.Integrator()
integrator.getInput("derivative").connect(model.getComponent("my_actuator").getOutput("power"))
work = integrator.integrate(stateTraj, 0.0, 1.5); # initial and final times.

chrisdembia commented 10 years ago

This comment really doesn't belong here, but it'd be nice if Inputs and Outputs were totally encapsulated in Component's. Should a user ever need to access Input or Output objects themselves? The third line above makes more sense at:

integrator.setInput("derivative", model.getComponent("my_actuator").getOutput("power")) // or
integrator.setInput("derivative", model.getComponent("my_actuator").get_power()) // if we create a macro for outputs that creates getters.

chrisdembia commented 10 years ago

It'd be nice if, for any operation that can be done on a State, that operation could also be done on a StateTrajectory. That way I could get the fiber length throughout a simulation via muscle.getFiberLength(stateTraj), and if this normally returns a double, it'd instead return a SimTK::Array<double> or something like that.

chrisdembia commented 10 years ago

Alternatively, there could be a component that did this operation for you.

mycomponent = ComputeOverTraj()
mycomponent.setInput("qty", muscle.getOutput("fiber_length"))
SimTK::Array<double> arr = mycomponent.compute(stateTraj)

Perhaps this is just what a Study is supposed to be.

chrisdembia commented 9 years ago

Just found this binary format MessagePack: https://github.com/msgpack/msgpack/blob/master/spec.md#formats-array

Seems to be similar to protobuf.

opensim-org / opensim-core

Serialize trajectories of States in their exact and complete form #149