iLCSoft / LCIO

Linear Collider I/O
BSD 3-Clause "New" or "Revised" License
17 stars 34 forks source link

Pull #99 broke LCIO interface of WHIZARD #112

Closed Romendakil closed 3 years ago

Romendakil commented 3 years ago

=========================================================== There was a crash. This is the entire stack trace of all threads:

0 0x00007f45dfa8e6e7 in __GI___waitpid (pid=22174, stat_loc=stat_loc

entry=0x7fff0ea39ba8, options=options entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30

1 0x00007f45df9f9107 in do_system (line=) at ../sysdeps/posix/system.c:149

2 0x00007f45dda76643 in TUnixSystem::StackTrace() () from /home/reuter/local/lib/libCore.so

3 0x00007f45dda79034 in TUnixSystem::DispatchSignals(ESignals) () from /home/reuter/local/lib/libCore.so

4

5 0x00007f45e0573115 in SIO::SIOReader::readNextEvent (this=0x55e2f6595ed0, accessMode=0) at /home/reuter/local/packages/LCIO-02-15-03/src/cpp/src/SIO/SIOReader.cc:108

6 0x00007f45e2d57f56 in read_lcio_event (lcRdr=0x55e2f6595ed0) at ../../../src/lcio/LCIOWrap.cpp:123

7 0x00007f45e29ca669 in lcio_interface_MP_lcio_readevent (lcrdr=0x7f45dafc6f50, evt=0x7f45dafc6f60, ok=0x7fff0ea3c708) at lcio_interface.f90:937

8 0x00007f45e29f2e85 in eio_lcio_MP_eio_lcio_input_iprc (eio=0x7f45dafc6d50, eio_Sig=0x7f45e3120288 <eio_lcio_DT_eio_lcio_tHEADER+8>, eio_Dtp=0x0, iprc=0x7fff0ea3c770, iostat_=0x7fff0ea3c76c) at eio_lcio.f90:305

9 0x000055e2f57928dc in eio_lcio_uti_MP_eio_lcio2 (u=0x7fff0ea3ca30) at eio_lcio_uti.f90:265

10 0x00007f45e2d3932a in unit_tests_MP_test (testproc=0x55e2f57905e4 , name_=0x55e2f59f55e4 "eio_lcio2", description=0x55e2f59f55d0 "read event contents", ulog=0x7fff0ea3ca94, results_=0x55e2f5c27fd0 , name_Len=10, description_Len=19) at unit_tests.f90:196

11 0x000055e2f5790553 in eio_lcio_ut_MP_eio_lciotest (u=0x7fff0ea3ca94, results_=0x55e2f5c27fd0 ) at eio_lcio_ut.f90:46

12 0x000055e2f4fbf35e in main_ut_IP_whizardcheck (check=0x55e2f5c28000 , results_=0x55e2f5c27fd0 ) at main_ut.f90:676

13 0x000055e2f4fbc221 in mainut (__NAGf90_main_args=0x0) at main_ut.f90:306

14 0x000055e2f4fba488 in main (argc=4, argv=0x7fff0ea3d898) at main_ut.f90:29

===========================================================

The lines below might hint at the cause of the crash. You may get help by asking at the ROOT forum http://root.cern.ch/forum Only if you are really convinced it is a bug in ROOT then please submit a report at http://root.cern.ch/bugs Please post the ENTIRE stack trace from above as an attachment in addition to anything else that might help us fixing this issue.

5 0x00007f45e0573115 in SIO::SIOReader::readNextEvent (this=0x55e2f6595ed0, accessMode=0) at /home/reuter/local/packages/LCIO-02-15-03/src/cpp/src/SIO/SIOReader.cc:108

6 0x00007f45e2d57f56 in read_lcio_event (lcRdr=0x55e2f6595ed0) at ../../../src/lcio/LCIOWrap.cpp:123

7 0x00007f45e29ca669 in lcio_interface_MP_lcio_readevent (lcrdr=0x7f45dafc6f50, evt=0x7f45dafc6f60, ok=0x7fff0ea3c708) at lcio_interface.f90:937

8 0x00007f45e29f2e85 in eio_lcio_MP_eio_lcio_input_iprc (eio=0x7f45dafc6d50, eio_Sig=0x7f45e3120288 <eio_lcio_DT_eio_lcio_tHEADER+8>, eio_Dtp=0x0, iprc=0x7fff0ea3c770, iostat_=0x7fff0ea3c76c) at eio_lcio.f90:305

9 0x000055e2f57928dc in eio_lcio_uti_MP_eio_lcio2 (u=0x7fff0ea3ca30) at eio_lcio_uti.f90:265

10 0x00007f45e2d3932a in unit_tests_MP_test (testproc=0x55e2f57905e4 , name_=0x55e2f59f55e4 "eio_lcio2", description=0x55e2f59f55d0 "read event contents", ulog=0x7fff0ea3ca94, results_=0x55e2f5c27fd0 , name_Len=10, description_Len=19) at unit_tests.f90:196

11 0x000055e2f5790553 in eio_lcio_ut_MP_eio_lciotest (u=0x7fff0ea3ca94, results_=0x55e2f5c27fd0 ) at eio_lcio_ut.f90:46

12 0x000055e2f4fbf35e in main_ut_IP_whizardcheck (check=0x55e2f5c28000 , results_=0x55e2f5c27fd0 ) at main_ut.f90:676

13 0x000055e2f4fbc221 in mainut (__NAGf90_main_args=0x0) at main_ut.f90:306

14 0x000055e2f4fba488 in main (argc=4, argv=0x7fff0ea3d898) at main_ut.f90:29

===========================================================

Romendakil commented 3 years ago

After the discussions with Jan Strube we came to the conclusions that both sides are trying to solve the issue of potential memory leaks which then generates a clash because of two sides deleting the same memory addresses. My conclusion would be for the moment to disallow usage of WHIZARD with LCIO newer than 2.14.

gaede commented 3 years ago

If you are reading from an LCIO file with the default LCReader you should not delete anything (see: 3.8.3 in https://github.com/iLCSoft/LCIO/blob/master/doc/manual.pdf). This has always been like that, except in 2.14, where we missed to include the correct memory management after re-implementing the underlying I/O. So I really think you should fix the Whizard LCIO-interface and use 2.15, rather than restricting the version to the buggy 2.14...

Romendakil commented 3 years ago

We have to discuss this internally, but as we get a double free which we don't get, this would mean that not only LCIO 2.14 was buggy but all versions from presumably at least 2.05 or so. One thing we have to check this if we remove the finalizer for the LCEvent object in our interface this won't lead to memory leaks when writing or reading LCIO from Whizard.

gaede commented 3 years ago

As just explained via email: don't delete the LCIO event when reading but delete it after you have created it for writing. I do not understand, how this problem would not have occured in Whizard with any versions older than 2.14, as also there we deleted the event before reading a new one (yet at a different place in the code...)

Romendakil commented 3 years ago

In the Whizard-LCIO interface we always deleted LCEvent objects created by us both when reading and writing. This means that all LCIO versions before 2.15 were suffering from that memory leak, not only 2.14, or there was some accidental magic which did not make it cl(r)ash before. If we have to distinguish between reading and writing that definitely means a change in the API and an incompatible change in the LCIO interface such that we have to veto all LCIO versions before 2.15.

rete commented 3 years ago

I don't understand this last comment:

In the Whizard-LCIO interface we always deleted LCEvent objects created by us both when reading and writing.

When you read an LCEvent you don't need to allocate anything. The LCReader does the allocation and deletion for you. Do you allocate anything before reading an LCEvent?