simphony / simphony-paraview

The simphony visualization wrappers using the paraview visualization toolkit engine
BSD 2-Clause "Simplified" License
3 stars 0 forks source link

current master segfaults on jenkins with the following error #34

Open itziakos opened 8 years ago

itziakos commented 8 years ago
test_source_from_a_xy_plane_rectangular_lattice (simphony_paraview.core.tests.test_cuds2vtk.TestCUDS2VTK) ... ok
test_source_from_an_orthorhombic_lattice (simphony_paraview.core.tests.test_cuds2vtk.TestCUDS2VTK) ... ok
test_with_cuds_mesh (simphony_paraview.core.tests.test_cuds2vtk.TestCUDS2VTK) ... ok
test_with_cuds_particles (simphony_paraview.core.tests.test_cuds2vtk.TestCUDS2VTK) ... ok
test_with_empty_cuds_mesh (simphony_paraview.core.tests.test_cuds2vtk.TestCUDS2VTK) ... ok
test_with_empty_cuds_particles (simphony_paraview.core.tests.test_cuds2vtk.TestCUDS2VTK) ... ok
test_with_invalid_cuds (simphony_paraview.core.tests.test_cuds2vtk.TestCUDS2VTK) ... ok
test_accumulate (simphony_paraview.core.tests.test_cuba_data_accumulator.TestCUBADataAccumulator) ... ok
test_accumulate_and_expand (simphony_paraview.core.tests.test_cuba_data_accumulator.TestCUBADataAccumulator) ... ok
test_accumulate_on_keys (simphony_paraview.core.tests.test_cuba_data_accumulator.TestCUBADataAccumulator) ... ok
test_accumulate_with_missing_values (simphony_paraview.core.tests.test_cuba_data_accumulator.TestCUBADataAccumulator) ... ok
test_data_conainter_with_tuple_values (simphony_paraview.core.tests.test_cuba_data_accumulator.TestCUBADataAccumulator) ... ok
test_raise_on_invalid_key (simphony_paraview.core.tests.test_cuba_data_accumulator.TestCUBADataAccumulator) ... ok
test_loadded_with_active_connection (simphony_paraview.core.tests.test_loaded_in_paraview.TestLoadedInParaview) ... ERROR: In /home/opencfd/OpenFOAM/ParaView-4.1.0/ParaViewCore/ServerImplementation/Core/vtkSIProxy.cxx, line 307
vtkSIProxy (0x7e23040): Failed to create vtkPolyDataMapper. Aborting for debugging purposes.
/home/travis/build.sh: line 45:  4392 Aborted                 (core dumped) coverage run -m unittest discover -v
stefanoborini commented 8 years ago

Strictly speaking, that's not a segfault...

stefanoborini commented 8 years ago

After investigation with the travis image, I seem to have the following information at hand (subject to revision)

Investigation in the paraview code seem to point that in the ClientServerInterpreter, there is supposedly code that fills a "registry" of the available routines starting with vtk in the python module. It seems that these routines are not found, and the inquire for NewInstance() to the interpreter returns NULL.

This is however just a hypothesis. I think that the only way to get to the bottom of this is to compile paraview and debug it at that level. That in turn might point out at the real cause.

Possibly GC?

stefanoborini commented 8 years ago

Attacking the problem: compiling paraview and investigating.

stefanoborini commented 8 years ago

Current finding seem to point at the fact that occasionally, the vtkPVServerManager_Initialize is not called, leaving the Interpreter uninitialized.

stefanoborini commented 8 years ago

I am still far from the actual solution, especially because adding printout debugs seem to make the occurrence of the problem less frequent. Unfortunately, I don't see anything that might justify a race condition to occur.

There are three hot spots:

VTK uses an observer pattern, but apparently it's synchronous.

Another fact I observed is that registration happens when paraview.simple is imported, but also when vtkSMSession is called. It appears to be this step that is broken and results in the error. This step is at the border between C++ and python, and I am still tracking the additional behavior down (it involves observer pattern on some events)

stefanoborini commented 8 years ago

Additional attempts brought me nowhere. Something that is clear is that plain calling paraview.simple.Connect()/Disconnect() again and again does not reproduce the bug.

It's unclear the connection between these steps and the InterpreterInitializer singleton.

stefanoborini commented 8 years ago

I found the problem, it's a vtk/paraview bug. Description (and hopefully possible workaround) will follow.

stefanoborini commented 8 years ago

The problem is as follows. Paraview uses a class called vtkClientServerInterpreter. This class is a factory for classes coming from external "plugin"-like sources. One of these classes is vtkSMTimeKeeper, but it's just one of the many, it just happens to often appear in the error.

The error occurs when the Interpreter is asked to factory one of these classes, but the class is not registered.

Why is it not registered? Registration happens through a complex mechanism involving automatically generated c++ files that get compiled. These automatically generated files (eg.. ParaViewCore/ServerManager/vtkSMTimeKeeperProxyClientServer.cxx) have _Init methods that accept the interpreter class and call csi->AddNewInstanceFunction() back into the Interpreter, which effectively registers the C function into the interpreter itself.

For a while everything seems fine, until one considers that each of these _Init routines have this part

114   static vtkClientServerInterpreter* last = NULL;                                                                                                     
115   if(last != csi) { // does the registration 
}

That is, the registration step is skipped if the interpreter is seen to be the same as in the last invocation, via the static pointer.

Unfortunately for us, when our tests (indirectly) deallocate and reallocate the interpreter, it can happen that the exact VM location is assigned to a freshly new Interpreter. The routine will compare this pointer with the last pointer, see that it's the same, and assume it doesn't need initialization. This is the reason why it occurred randomly. It depended on the delete/new cycle of the Intepreter to return the same exact memory pointer.

stefanoborini commented 8 years ago

That said, I am unsure we can workaround it. They are all internal classes mostly, and I don't see how we can force the initialization of the interpreter regardless, considering where the check is done.

stefanoborini commented 8 years ago

Opened issue at https://gitlab.kitware.com/paraview/paraview/issues/16834 . Waiting for action.