Closed aseth1 closed 1 year ago
It could use a hash of the model file, but that would prevent someone doing things like adding comments to their model file, or adding components that might not affect that state. I currently use HDF5 to store all my data: https://en.wikipedia.org/wiki/Hierarchical_Data_Format. The benefit of using a format like this is that it has API's in multiple languages. So OpenSim could write an HDF5 file and then a user could read it in MATLAB.
A quote from that wikipedia article:
Because it uses B-trees to index table objects, HDF5 works well for time series data such as stock price series, network monitoring data, and 3D meteorological data. The bulk of the data goes into straightforward arrays (the table objects) that can be accessed much more quickly than the rows of a SQL database, but B-Tree access is available for non-array data. The HDF5 data storage mechanism can be simpler and faster than an SQL star schema.
We might want to be somewhat lax about the association between Models and serialized States. It's nice to be able to use States with compatible but not-quite-exactly-the-same Models. Changes to any trivia like comments, names of things, extra visualization geometry, author lists, etc. shouldn't mean that precious trajectories become useless. We can easily make some checks on sizes of things and prevent nonsense usage.
If we serialize to XML or some other ascii format, it will be possible to do some minor editing to extend the life of some trajectories, and it will be possible for people to generate states using other programs like Matlab. We should make sure that's possible I think.
Is the motivation to have a binary format the size of the file?
Is the motivation to have a binary format the size of the file?
I think so. Ajay and I had discussed using a zip of an ascii format. That has the advantage of being editable but small, and also allows us to zip up a collection of files into what would look like a single OpenSim model file. That is how Microsoft Word works -- .docx is actually a zip of an elaborate directory structure with numerous subfiles.
Of course a zipped ascii file is not much good for fast lookup.
MapleSim works similarly. They zip the model along with Maple worksheets, simulation results, images, etc. into an .msim file.
We might want to be somewhat lax about the association between Models and serialized States. It's nice to be able to use States with compatible but not-quite-exactly-the-same Models. Changes to any trivia like comments, names of things, extra visualization geometry, author lists, etc. shouldn't mean that precious trajectories become useless. We can easily make some checks on sizes of things and prevent nonsense usage.
I disagree here @sherm1. Calling things states when there is no guarantee that they satisfy the constraints, dynamics, etc... is very misleading. We then always have to waste time and introduce ambiguity to verify are these "states" really States and how do we make them proper States? The point of having binary like states would be to guarantee that a trajectory was generated from a given model under a specific generation scheme (forward integration, etc...) Users should never modify States! They should be sacred and remain untouched. Being able to access them in MATLAB or other software is OK but if modified in anyway the checksum will fail.
To handle your scenario we should make it straightforward to go from from reported outputs (easy to edit storage files that we generate already) to States via a generator. The input of the generator making it explicit that you are giving some "guess" of what the states should be. One can still have reports (tab delim files) of q's u's and muscle activations, fiber lengths for processing and editing, but the underlying States trajectory should be the exact archive of a specific simulation and never be editable in my opinion.
There may be roles for both kinds of serialized states. I don't think they are mutually exclusive. At the bottom level we need a way to serialize a single State in such a manner that we can recreate the binary State object from the serialization. That will be useful no matter what. Then we can build various policies on top of that, including an inviolate checksum when that's appropriate.
Serializing a single SimTK::State object doesn't actually require a System (or Model) object at all. Today you can already make a copy of a State without a System present (and that is very useful!).
For 4.0: Include a mechanism to output full state trajectory. Need to assess the degree to which this gets exposed to GUI users (based on other progress on model components and reporting of outputs).
Say I wanted to get the total work done by my actuator over a gait cycle. It'd be really neat if I could do something like
stateTraj = opensim.StateTrajectory("my_serialized_state_trajectory.xml")
integrator = opensim.Integrator()
integrator.getInput("derivative").connect(model.getComponent("my_actuator").getOutput("power"))
work = integrator.integrate(stateTraj, 0.0, 1.5); # initial and final times.
This comment really doesn't belong here, but it'd be nice if Inputs and Outputs were totally encapsulated in Component's. Should a user ever need to access Input or Output objects themselves? The third line above makes more sense at:
integrator.setInput("derivative", model.getComponent("my_actuator").getOutput("power")) // or
integrator.setInput("derivative", model.getComponent("my_actuator").get_power()) // if we create a macro for outputs that creates getters.
It'd be nice if, for any operation that can be done on a State, that operation could also be done on a StateTrajectory. That way I could get the fiber length throughout a simulation via muscle.getFiberLength(stateTraj)
, and if this normally returns a double
, it'd instead return a SimTK::Array<double>
or something like that.
Alternatively, there could be a component that did this operation for you.
mycomponent = ComputeOverTraj()
mycomponent.setInput("qty", muscle.getOutput("fiber_length"))
SimTK::Array<double> arr = mycomponent.compute(stateTraj)
Perhaps this is just what a Study is supposed to be.
Just found this binary format MessagePack: https://github.com/msgpack/msgpack/blob/master/spec.md#formats-array
Seems to be similar to protobuf.
Includes discrete state variables but not cached variables. A States trajectory must be tied to a model. It shouldn't be easily human readable (e.g. zipped) and not bound by model name but by other unique key. It would also be handy if it maintained the version of OpenSim used to generate the States as well. Party addresses issue raised in #104 for robust post-hoc analysis of a simulation result.