shalinshah1993 / SBSCL

The Systems Biology Simulation Core Library (SBSCL) provides an efficient and exhaustive Java implementation of methods to interpret the content of models encoded in the Systems Biology Markup Language (SBML) and its numerical solution.
https://draeger-lab.github.io/SBSCL/
GNU Lesser General Public License v3.0
0 stars 2 forks source link

Post-processing simulation data for output #17

Closed shalinshah1993 closed 6 years ago

shalinshah1993 commented 6 years ago

The output data from executing simulations is a Map<AbstractTask, Object> where Object can be IRawSimulationResults or Map<AbstractTask, Object> or Map<AbstractTask, IRawSimulationResults>. Process this raw-results data-structure to generate a MultiTable output.

https://github.com/shalinshah1993/SBSCL/blob/d464d84ea47e4ab66a0a3903fb4dcf061cd7678c/src/org/simulator/sedml/SedMLSBMLSimulatorExecutor.java#L288

@matthiaskoenig How should DataGenerator for RepeatedTasks look like?

shalinshah1993 commented 6 years ago

@draeger Do you have any comments here?

draeger commented 6 years ago

We should discuss this tomorrow during our meeting. I am not familiar enough with these details of SED-ML to give a qualified answer. @matthiaskoenig certainly knows what is best.

matthiaskoenig commented 6 years ago

Currently, there is no math or postprocessing in SED-ML which spans over multiple repeats of a repeated task. I.e. every repeat is a single dataset (generated via simulation) on which the postprocessing is applied. I.e. you just have to apply postprocessing on every repeat individually.

shalinshah1993 commented 6 years ago

I understand what you are saying but SED-ML specs say that in a dataGenerator a variable cannot have both symbol and target although one is essential. If a variable is inside dataGenerator then it must contain a taskReference.

Now let's assume it refers to a repeatedTask. Which iteration of this repeatedTask should be used as output? We don't know! My understanding of what this means is: We need values from all the iterations. If we need results for the variable from all the iterations then we need to merge results of repeatedTasks into IRawSimulationResults instead of keeping a List<IRawSimulationResults> for one repeatedTasks.

Am I missing something here? Is there a way to refer to one particular iteration of a repeatedTask?

matthiaskoenig commented 6 years ago

I don't understand your problem completely. As I understand it a dataGenerator should never reference to a repeatedTask, but alsways to the inner simple task which is repeated. Do you have an example where a dataGenerator is referencing a repeatedTask?

shalinshah1993 commented 6 years ago

I didn't find that listed explicitly in the specification so I was assuming that it (dataGenerator referring to a repeatedTasks) is possible.

Okay, we have following rules:

This brings me to next question: What happens to the results of repeatedTasks if dataGenerator never refers to it? I am basically trying to understand what to do with the results of repeatedTasks.

matthiaskoenig commented 6 years ago

I looked at the examples. The task the dataGenerator is referencing is giving you the list of simulation results you have to process. This could be a simple task or a repeated task. The dataGenerator will contain all the processed data of the repeated task, i.e. all individual repeats postprocessed individually. If you than plot the dataGenerator you will plot all the data postprocessed from the individual repeats.

See for instance https://sed-ml.github.io/documents/sed-ml-L1V3.pdf
A.3.1 Time course parameter scan

This works the same in L1V2.

shalinshah1993 commented 6 years ago

Thanks for pointing out these examples in SED-ML specifications. It clears a lot of thin air. BTW, I am using exactly these files to test our new code. They are in src/test/resources/ folder.

I read examples carefully and something about A 3.2 is not clear to me. It has repeatedTasks repeated 10 times using uniformRange but there is only one output curve instead of 10. Why is that? Is there something to do with steadyState simulation? Contrary to this example, A 3.5 does have 10 steadyState curves since their master range has 10 elements. (https://github.com/shalinshah1993/SBSCL/issues/9)

In our implementation, I think we should follow webtools-like implementation where instead of concatenating each repeatedTask iteration (like tellurium), we simply draw the dataGenerator for all the iterations.

shalinshah1993 commented 6 years ago

I finished working on post-processing you can find code added in this commit - https://github.com/shalinshah1993/SBSCL/commit/f5ac28e9ec5234397b1836ccd36e087b7bfc0a2f

However, we need to test it like you pointed out in https://github.com/shalinshah1993/SBSCL/pull/35

However, L1V2 example files from SED-ML has errors. For example, examples 3 and 5 from sedML L1V3 specifications have hardcoded model paths as E:\Users\fbergmann\Documents\sbml models\. We need to correct these sedml files and use them to test code.

matthiaskoenig commented 6 years ago

@shalinshah1993 You can start testing by making the following test in /src/test/java/org/simulator/sedml/SEDMLExecuterTest work.

Just uncomment the @ignore.

    /**
     * Retrieves model from Miriam DB - needs internet connection
     *
     * @throws XMLException
     */
    @Test
    @Ignore //https://github.com/shalinshah1993/SBSCL/issues/31
    public final void testBasicSEDMLExecutorForMiriamURNDefinedModel() throws XMLException, IOException {

        String miriamPath = TestUtils.getPathForTestResource(miriamtest);
        SEDMLDocument doc = Libsedml.readDocument(new File(miriamPath));
        assertNotNull(doc);

        SedML sedml = doc.getSedMLModel();
        assertNotNull(sedml);

        // in this SED-ML file there's just one output. If there were several,
        // we could either iterate or get user to  decide what they want to run.
        Output wanted = sedml.getOutputs().get(0);
        SedMLSBMLSimulatorExecutor exe = new SedMLSBMLSimulatorExecutor(sedml, wanted);

        // This gets the raw simulation results - one for each Task that was run.
        Map<AbstractTask, List<IRawSedmlSimulationResults>> res = exe.run();
        if (res == null || res.isEmpty() || !exe.isExecuted()) {
            fail("Simulation failed: " + exe.getFailureMessages().get(0));
        }
        // now process.In this case, there's no processing performed - we're displaying the
        // raw results.
        MultiTable mt = exe.processSimulationResults(wanted, res);
        assertNotNull(mt);

        assertTrue(3 == mt.getColumnCount());
        assertEquals("Time", mt.getTimeName());
    }

Currently this gives a NullPointerException

java.lang.NullPointerException
    at org.simulator.sedml.ProcessSedMLResults.process(ProcessSedMLResults.java:219)
    at org.simulator.sedml.SedMLSBMLSimulatorExecutor.processSimulationResults(SedMLSBMLSimulatorExecutor.java:560)
    at org.simulator.sedml.SEDMLExecutorTest.testBasicSEDMLExecutorForMiriamURNDefinedModel(SEDMLExecutorTest.java:139)

I will add single tests for all the L1V2 examples in the SEDMLExecutorTest. I will @Ignore them for now, you should remove the @Ignore and make the single tests work.

matthiaskoenig commented 6 years ago

Test cases for the L1V2 examples have been added. Same like above. You should remove the @Ignore and make the tests work for the L1V2 examples.

https://github.com/shalinshah1993/SBSCL/pull/40/commits/27a2837570124db485a67cfa664c8600173ff164

matthiaskoenig commented 6 years ago

@shalinshah1993 I would start with getting the sedml/L1V2/repressilator example to work. This is not using any repeated task and only contains simple post-processing.

shalinshah1993 commented 6 years ago

This depends on #42.