Open itziakos opened 8 years ago
Strictly speaking, that's not a segfault...
After investigation with the travis image, I seem to have the following information at hand (subject to revision)
Investigation in the paraview code seem to point that in the ClientServerInterpreter, there is supposedly code that fills a "registry" of the available routines starting with vtk
in the python module. It seems that these routines are not found, and the inquire for NewInstance() to the interpreter returns NULL.
This is however just a hypothesis. I think that the only way to get to the bottom of this is to compile paraview and debug it at that level. That in turn might point out at the real cause.
Possibly GC?
Attacking the problem: compiling paraview and investigating.
Current finding seem to point at the fact that occasionally, the vtkPVServerManager_Initialize
is not called, leaving the Interpreter uninitialized.
I am still far from the actual solution, especially because adding printout debugs seem to make the occurrence of the problem less frequent. Unfortunately, I don't see anything that might justify a race condition to occur.
There are three hot spots:
VTK uses an observer pattern, but apparently it's synchronous.
Another fact I observed is that registration happens when paraview.simple is imported, but also when vtkSMSession is called. It appears to be this step that is broken and results in the error. This step is at the border between C++ and python, and I am still tracking the additional behavior down (it involves observer pattern on some events)
Additional attempts brought me nowhere. Something that is clear is that plain calling paraview.simple.Connect()/Disconnect() again and again does not reproduce the bug.
It's unclear the connection between these steps and the InterpreterInitializer singleton.
I found the problem, it's a vtk/paraview bug. Description (and hopefully possible workaround) will follow.
The problem is as follows. Paraview uses a class called vtkClientServerInterpreter
. This class is a factory for classes coming from external "plugin"-like sources. One of these classes is vtkSMTimeKeeper, but it's just one of the many, it just happens to often appear in the error.
The error occurs when the Interpreter is asked to factory one of these classes, but the class is not registered.
Why is it not registered? Registration happens through a complex mechanism involving automatically generated c++ files that get compiled. These automatically generated files (eg.. ParaViewCore/ServerManager/vtkSMTimeKeeperProxyClientServer.cxx
) have _Init methods that accept the interpreter class and call csi->AddNewInstanceFunction()
back into the Interpreter, which effectively registers the C function into the interpreter itself.
For a while everything seems fine, until one considers that each of these _Init routines have this part
114 static vtkClientServerInterpreter* last = NULL;
115 if(last != csi) { // does the registration
}
That is, the registration step is skipped if the interpreter is seen to be the same as in the last invocation, via the static
pointer.
Unfortunately for us, when our tests (indirectly) deallocate and reallocate the interpreter, it can happen that the exact VM location is assigned to a freshly new Interpreter. The routine will compare this pointer with the last
pointer, see that it's the same, and assume it doesn't need initialization. This is the reason why it occurred randomly. It depended on the delete/new cycle of the Intepreter to return the same exact memory pointer.
That said, I am unsure we can workaround it. They are all internal classes mostly, and I don't see how we can force the initialization of the interpreter regardless, considering where the check is done.
Opened issue at https://gitlab.kitware.com/paraview/paraview/issues/16834 . Waiting for action.