trelau / SMESH

Mesh module from the Salome Platform
GNU Lesser General Public License v2.1
50 stars 31 forks source link

Tests not passing on OSX #37

Closed trelau closed 2 years ago

trelau commented 3 years ago

Summary:

Cc @looooo

looooo commented 3 years ago

in addition the backtrace of the error we see in freecad (using lldb with a build in release mode)

* thread #1, name = 'CrBrowserMain', queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x0000000127a59b7b libSMDS.dylib`SMDS_ElementChunk::~SMDS_ElementChunk() + 283
    frame #1: 0x000000011f5f51c8
    frame #2: 0x0000000127a57c49 libSMDS.dylib`SMDS_ElementFactory::Clear() + 121
    frame #3: 0x0000000127a62d9a libSMDS.dylib`SMDS_Mesh::Clear() + 42
    frame #4: 0x00000001112b4376 libSMESHDS.dylib`SMESHDS_Mesh::ClearMesh() + 54
    frame #5: 0x000000012d9aad97 libSMESH.dylib`SMESH_Mesh::~SMESH_Mesh() + 263
    frame #6: 0x000000012d9ab29e libSMESH.dylib`SMESH_Mesh::~SMESH_Mesh() + 14
    frame #7: 0x000000011fad1b7a MeshPart.so`MeshPart::Mesher::createMesh() const + 24970

@wwmayer maybe you have an idea what is wrong here.

wwmayer commented 3 years ago

@wwmayer maybe you have an idea what is wrong here.

Not really from the call stack alone. But what I can say is that it always was a pita to work with SMESH_Gen and SMESH_Mesh. The class MeshPart::Mesher holds a static pointer to a SMESH_Gen instance so that it won't create more than one instance of it. Inside Mesher::createMesh() SMESH_Gen is used to create an instance of SMESH_Mesh which will be destroyed when leaving Mesher::createMesh().

Possibly the newer SMESH version doesn't like it that the SMESH_Mesh instance is explicitly deleted. But from the current API I don't see how the system will cleanup the SMESH_Mesh instances.

looooo commented 3 years ago

thanks @wwmayer

Disable building Netgen and NETGENPlugin for OSX and split one StdMeshers test into its own exe and it still fails...

@trelau I can test on a mac in the next days (compiling in debug mode). How can I build SMESH without netgen?

trelau commented 3 years ago

@looooo there is this branch https://github.com/trelau/SMESH/tree/osx_fix that has a CMake option ENABLE_NETGEN that you can set to OFF and build without.

Here is an example setting in the workflow https://github.com/trelau/SMESH/blob/917a678a874995266127fa0fea68e2888ff1fa22/ci/conda/build.sh#L11

edit: that branch seems to work (i.e., optionally building Netgen) I just haven't merged it yet. figure i'd wait until others had a chance to take a crack at an osx_fix.

looooo commented 3 years ago

@trelau so the osx_fix branch should not have the "illegal instruction" issue?

trelau commented 3 years ago

@looooo no it does, just meant that is the branch where you can set ENABLE_NETGEN to off to try building without Netgen to see if the issue exists in Netgen or not. In my tests via github actions, I would still get that error even without Netgen.

looooo commented 3 years ago

testing smash in debug mode I cannot reproduce the crash on osx:

./test_SMESH 
===============================================================================
test cases: 3 | 3 passed
assertions: - none -
trelau commented 3 years ago

Took a wild guess based on this https://github.com/pybind/pybind11/issues/1401.

Made SMDS_ElementChunk destructor virtual here https://github.com/trelau/SMESH/tree/another_osx_fix_attempt but it still failed.

Maybe there are other destructors that need to be made virtual?

edit: more general info here https://www.geeksforgeeks.org/virtual-destructor/

looooo commented 3 years ago

testing smesh in debug mode I cannot reproduce the crash on osx:

./test_SMESH 
===============================================================================
test cases: 3 | 3 passed
assertions: - none -

@trelau if this is true, the crash might be related to optimization. Maybe we can try using -O1 as optimization-level as a workaround?

trelau commented 3 years ago

maybe this has some hints: https://stackoverflow.com/questions/186237/program-only-crashes-as-release-build-how-to-debug

I'm trying a few things in a new branch, but tough to do since I don't work on OSX.

Perhaps there is a "Variable intitialiation" issues somewhere (which this article claims is common https://stackoverflow.com/questions/312312/what-are-some-reasons-a-release-build-would-run-differently-than-a-debug-build?rq=1)

also for ref: https://clang.llvm.org/docs/MemorySanitizer.html

trelau commented 3 years ago

There are a lot of warnings that show up for OSX that don't show up for Linux and Windows:

https://github.com/trelau/SMESH/runs/2705396821?check_suite_focus=true#step:8:273

https://github.com/trelau/SMESH/runs/2705396821?check_suite_focus=true#step:8:276

https://github.com/trelau/SMESH/runs/2705396821?check_suite_focus=true#step:8:311

https://github.com/trelau/SMESH/runs/2705396821?check_suite_focus=true#step:8:345

(these are just samples. see the log for all the various ones)

trelau commented 3 years ago

Tried a few things here, but still no luck https://github.com/trelau/SMESH/tree/osx_optimization_level

looooo commented 2 years ago

trying other optimization level here: https://github.com/conda-forge/smesh-feedstock/pull/49

looooo commented 2 years ago

@trelau I am trying to adress this issue again. I can reproduce the illigal instruction which occures in freecad aslo with the StdMeshers test. But I have issues creating a debug build of smesh. This is the output of lldb:

lldb test_StdMeshers
(lldb) target create "test_StdMeshers"
(lldb) run
Process 8312 launched: '/Users/lo/miniconda3/conda-bld/debug_1633193024057/work/SMESH/test/tests/test_StdMeshers' (x86_64)
libSMDS.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 8312 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x0000000101321815 libSMDS.dylib`SMDS_ElementChunk::~SMDS_ElementChunk() [inlined] SMDS_MeshElement::~SMDS_MeshElement(this=<unavailable>) at SMDS_MeshElement.hxx:55 [opt]
Target 2: (test_StdMeshers) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x0000000101321815 libSMDS.dylib`SMDS_ElementChunk::~SMDS_ElementChunk() [inlined] SMDS_MeshElement::~SMDS_MeshElement(this=<unavailable>) at SMDS_MeshElement.hxx:55 [opt]
    frame #1: 0x0000000101321815 libSMDS.dylib`SMDS_ElementChunk::~SMDS_ElementChunk(this=<unavailable>) at SMDS_ElementFactory.cxx:545 [opt]
    frame #2: 0x000000010131fae9 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] SMDS_ElementChunk::~SMDS_ElementChunk(this=0x0000000104c30cf0) at SMDS_ElementFactory.cxx:544 [opt]
    frame #3: 0x000000010131fae1 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] void boost::checked_delete<SMDS_ElementChunk const>(x=<unavailable>) at checked_delete.hpp:36 [opt]
    frame #4: 0x000000010131fadc libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] void boost::delete_clone<SMDS_ElementChunk>(r=<unavailable>) at clone_allocator.hpp:45 [opt]
    frame #5: 0x000000010131fadc libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] void boost::heap_clone_allocator::deallocate_clone<SMDS_ElementChunk>(r=<unavailable>) at clone_allocator.hpp:63 [opt]
    frame #6: 0x000000010131fadc libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] void boost::ptr_container_detail::reversible_ptr_container<boost::ptr_container_detail::sequence_config<SMDS_ElementChunk, std::__1::vector<void*, std::__1::allocator<void*> > >, boost::heap_clone_allocator>::remove<boost::void_ptr_iterator<std::__1::__wrap_iter<void**>, SMDS_ElementChunk> >(this=<unavailable>, i=<unavailable>) at reversible_ptr_container.hpp:237 [opt]
    frame #7: 0x000000010131fad9 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] void boost::ptr_container_detail::reversible_ptr_container<boost::ptr_container_detail::sequence_config<SMDS_ElementChunk, std::__1::vector<void*, std::__1::allocator<void*> > >, boost::heap_clone_allocator>::remove<boost::void_ptr_iterator<std::__1::__wrap_iter<void**>, SMDS_ElementChunk> >(this=<unavailable>, first=<unavailable>, last=<unavailable>) at reversible_ptr_container.hpp:244 [opt]
    frame #8: 0x000000010131fac6 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] boost::ptr_container_detail::reversible_ptr_container<boost::ptr_container_detail::sequence_config<SMDS_ElementChunk, std::__1::vector<void*, std::__1::allocator<void*> > >, boost::heap_clone_allocator>::remove_all(this=<unavailable>) at reversible_ptr_container.hpp:205 [opt]
    frame #9: 0x000000010131fac6 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] boost::ptr_container_detail::reversible_ptr_container<boost::ptr_container_detail::sequence_config<SMDS_ElementChunk, std::__1::vector<void*, std::__1::allocator<void*> > >, boost::heap_clone_allocator>::~reversible_ptr_container(this=<unavailable>) at reversible_ptr_container.hpp:499 [opt]
    frame #10: 0x000000010131fac6 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] boost::ptr_sequence_adapter<SMDS_ElementChunk, std::__1::vector<void*, std::__1::allocator<void*> >, boost::heap_clone_allocator>::~ptr_sequence_adapter(this=<unavailable>) at ptr_sequence_adapter.hpp:135 [opt]
    frame #11: 0x000000010131fac6 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] boost::ptr_vector<SMDS_ElementChunk, boost::heap_clone_allocator, void>::~ptr_vector(this=<unavailable>) at ptr_vector.hpp:39 [opt]
    frame #12: 0x000000010131fac6 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] boost::ptr_vector<SMDS_ElementChunk, boost::heap_clone_allocator, void>::~ptr_vector(this=<unavailable>) at ptr_vector.hpp:39 [opt]
    frame #13: 0x000000010131fac6 libSMDS.dylib`SMDS_ElementFactory::Clear() [inlined] void (anonymous namespace)::clearVector<boost::ptr_vector<SMDS_ElementChunk, boost::heap_clone_allocator, void> >(v=<unavailable>) at ObjectPool.hxx:34 [opt]
    frame #14: 0x000000010131faa6 libSMDS.dylib`SMDS_ElementFactory::Clear(this=<unavailable>) at SMDS_ElementFactory.cxx:270 [opt]
    frame #15: 0x000000010132beca libSMDS.dylib`SMDS_Mesh::Clear(this=<unavailable>) at SMDS_Mesh.cxx:1554 [opt]
    frame #16: 0x00000001011b9ce6 libSMESHDS.dylib`SMESHDS_Mesh::ClearMesh(this=<unavailable>) at SMESHDS_Mesh.cxx:1033 [opt]
    frame #17: 0x0000000100581a5d libSMESH.dylib`SMESH_Mesh::~SMESH_Mesh(this=<unavailable>) at SMESH_Mesh.cxx:192 [opt]
    frame #18: 0x0000000100581eee libSMESH.dylib`SMESH_Mesh::~SMESH_Mesh() [inlined] SMESH_Mesh::~SMESH_Mesh(this=<unavailable>) at SMESH_Mesh.cxx:178 [opt]
    frame #19: 0x0000000100581ee9 libSMESH.dylib`SMESH_Mesh::~SMESH_Mesh(this=<unavailable>) at SMESH_Mesh.cxx:178 [opt]
    frame #20: 0x000000010003f6d3 test_StdMeshers`____C_A_T_C_H____T_E_S_T____0() at StdMeshers.t.cpp:41 [opt]
    frame #21: 0x00000001000221f1 test_StdMeshers`Catch::RunContext::invokeActiveTestCase() [inlined] Catch::TestCase::invoke(this=<unavailable>) const at catch.hpp:14160 [opt]
    frame #22: 0x00000001000221e5 test_StdMeshers`Catch::RunContext::invokeActiveTestCase(this=<unavailable>) at catch.hpp:13020 [opt]
    frame #23: 0x000000010001fe4f test_StdMeshers`Catch::RunContext::runCurrentTest(this=<unavailable>, redirectedCout=<unavailable>, redirectedCerr=<unavailable>) at catch.hpp:12993 [opt]
    frame #24: 0x000000010001f403 test_StdMeshers`Catch::RunContext::runTest(this=<unavailable>, testCase=<unavailable>) at catch.hpp:12754 [opt]
    frame #25: 0x0000000100025d82 test_StdMeshers`Catch::Session::runInternal() at catch.hpp:13347 [opt]
    frame #26: 0x0000000100025c1c test_StdMeshers`Catch::Session::runInternal(this=<unavailable>) at catch.hpp:13553 [opt]
    frame #27: 0x0000000100025292 test_StdMeshers`Catch::Session::run(this=<unavailable>) at catch.hpp:13509 [opt]
    frame #28: 0x000000010003f2bf test_StdMeshers`main [inlined] int Catch::Session::run<char>(this=<unavailable>, argc=<unavailable>, argv=<unavailable>) at catch.hpp:13231 [opt]
    frame #29: 0x000000010003f295 test_StdMeshers`main(argc=<unavailable>, argv=<unavailable>) at catch.hpp:17526 [opt]
    frame #30: 0x00007fff68d78015 libdyld.dylib`start + 1
trelau commented 2 years ago

@looooo thanks. i took a wild guess here (with no luck) but in case it gives you any other ideas https://github.com/trelau/SMESH/commit/ae9240ddaad9a54e5f7c91338868817df2ffaaf9

main change there was trying to initialize a member variable with nullptr (like i said, just a wild guess, but didn't seem to make a difference)

if you can't make a debug of FreeCAD, would it be possible (and helpful) to make a debug build of the test case in this repo? then see if you can trace it any better?

looooo commented 2 years ago

@trelau this is what I am trying to do, but somehow the message (... was compiled with optimization) never disapeared.

with conda you have to switch also the compiler flags like this:

export CXXFLAGS=${DEBUG_CXXFLAGS}
export CFLAGS=${DEBUG_CFLAGS}

we are also discussing the issue here: https://forum.freecadweb.org/viewtopic.php?f=42&t=51997&start=40

Btw.:I can reproduce the illegal instruction that I am seeing with freecad with the test_Stdmesher test case and the problem is clearly related to deleting the mesh / gen.

trelau commented 2 years ago

@looooo thanks. btw...i noticed a reference in that thread to vtk9 issues. note i had to add this include file with 9.7.0 and VTK 9 to get it to work https://github.com/trelau/SMESH/commit/36faf9f92dc891af7861066fe526a504b0305269#diff-4e164a66c9d23486e3a9463aab62b16d82231ff286704545cd2d1b7660f364b7R102 (in case it provides a hint for one issue in that thread)

looooo commented 2 years ago

thanks, I keep this in mind. Maybe @hobbes is interested in this too.

hobbes commented 2 years ago

@looooo I think that you meant @hobbes1069 :-)

looooo commented 2 years ago

sure, you are right @hobbes, I wanted to ping @hobbes1069 :)

I tried to add a patch for osx, checking if the tests are running when removing a certain line, but somehow the patch is not applied: @trelau can you see why the patch doesn't work: https://github.com/looooo/SMESH/blob/v9.7.0.0_patched/patch/mac_crash.patch

trelau commented 2 years ago

@looooo i added the patch here and it seemed to work (patching the file, tests haven't completed yet) https://github.com/trelau/SMESH/commit/634bd48cf8e12a2bcbfff82fddf03f789bd44de8

edit: seems like that made the osx tests pass, but that seems strange...

as a worst case hack....could we conditionally compile that lines? like add a preprocessor for compiling those lines only on linux/windows? would rather fix it obvi....wish i could replicate on windows....maybe there is a more strict compiler setting to get it to error out?

looooo commented 2 years ago

It's possible the elements are deleted twice which leads to undefined behaviour and might crash.

Not sure if it works:

#ifndef  __clang__
delete [] myElements;
#endif
looooo commented 2 years ago

@trelau is it possible to upload a osx-package so I can test it with freecad?

hobbes1069 commented 2 years ago

I read through everything but you guys are already getting in deeper than my knowledge base as a packager and occasion code contributor. :)

trelau commented 2 years ago

Per that FreeCAD thread, worth adding the -fsanitize=address compiler option?

trelau commented 2 years ago

tried that...not really sure what this tells me... https://github.com/trelau/SMESH/runs/3797974245?check_suite_focus=true

trelau commented 2 years ago

Fixed by #53