sys-bio / roadrunner

libRoadRunner: A high-performance SBML simulator
http://libroadrunner.org/
Other
39 stars 25 forks source link

Segmentation faults in debug mode while running tests #528

Closed andrewbriand closed 4 years ago

andrewbriand commented 5 years ago

Found two segmentation faults when running the test suite in debug mode. One occurs in csr_matrix_new in rrSparse.cpp at lines 118 and 119 if nnz is 0, this seems to be fixed by adding a check to ensure the vectors being accessed have size > 0 like so:

    if (mcolidx.size() > 0) {
        memcpy(mat->colidx, &mcolidx[0], nnz * sizeof(unsigned));
    }
    if (mvalues.size() > 0) {
        memcpy(mat->values, &mvalues[0], nnz * sizeof(double));
    }

Another segfault occurs when the destructor of LLVMModelDataSymbols is called in core test RELOADING_MODEL_MODEL_RECOMPILATION. The stack trace shows that it occurs in std::_Tree<_Traits> in the function _Orphan_ptr. For some reason de-referencing the pointer _Pnext gives 0xcccccccccccccccc, the visual studio debugger magic number for uninitialized stack memory, when it should be a pointer to an iterator in a linked list of iterators.

annedrewhu commented 5 years ago

With this second segfault, it looks like something in our code caused internal corruption in the MSVC debug iterators over the map. @andrewbriand which map was it again? We figured out it was one of the StringUintMap members

andrewbriand commented 5 years ago

@hu-a I believe it was happening in floatingSpeciesMap

When I commented out the line that loads the values into floatingSpeciesMap, the segfault no longer occured.

annedrewhu commented 5 years ago

@andrewbriand Did we ever try running the debug tests, but without RELOADING_MODEL_MODEL_RECOMPILATION? Maybe it's just the core tests that are causing this problem. If so, we could probably ignore it for now and ship the LLVM 6 release.

andrewbriand commented 5 years ago

We haven't tried that yet, I'll try it tomorrow and keep you updated.

andrewbriand commented 5 years ago

@hu-a When I ran the debug tests without RELOADING_MODEL_MODEL_RECOMPILATION, the test RELOADING_MODEL_NO_MODEL_RECOMPILATION fails with a (presumably the same?) segfault in the LLVMModelDataSymbols dtor. Repeating this process I found that it also occurs in: LOADING_MODEL_MULTIPLE_INSTANCES LOAD_MODEL_FROM_STRING SET_COMPUTE_AND_ASSIGN_CONSERVATION_LAWS FREE_RR_INSTANCE in the TEST_MODEL suite FREE_RR_INSTANCE in the NOM_TESTS suite FREE_RR_INSTANCE in the LIBSTRUCT_TESTS suite SBML_TEST 7 I stopped there and tested it in release mode, all the tests pass in release except GENERATING_MODEL_HASH (issue #527).

annedrewhu commented 5 years ago

We need to make sure that there aren't any segfaults in the tests in release mode. I tried compiling in release mode off of llvm-6 and I got a segfault.

@andrewbriand When you get a chance can you try running the test suite in release mode and see if they pass? I want to see if your environment works since I did rebase the llvm-6 branch and maybe that screwed something up?

andrewbriand commented 5 years ago

@hu-a I ran the tests a few days ago in release mode, and all of them passed except GENERATING_MODEL_HASH. Here was the output: https://gist.github.com/andrewbriand/8698b4a55155c71920626f937a271fcb.

I'm not sure what branch I was on when I ran them but I can check and run them again tomorrow.

EDIT: Come to think of it, it's possible that I forgot to remove the temporary fix in LLVMModelDataSymbols when running the release tests, but like I said I'll check tomorrow.

andrewbriand commented 5 years ago

I ran the tests up to date with llvm-6 in release mode today and all of them passed except GENERATING_MODEL_HASH: https://gist.github.com/andrewbriand/e69967ad49b03dfb76db0f9aea24a1c1

luciansmith commented 4 years ago

This was fixed with https://github.com/sys-bio/roadrunner/commit/26deb2fce059d5cfe23ba81c42e42575f7a1c3be The only remaining issue is that there didn't seem to be anything wrong with the function in question: throwing if you don't find the index should be perfectly fine. But in this case, it clearly wasn't, and this might indicate a bug in VS and/or LLVM. But it's not an issue for roadrunner any more, so I'm closing this.