mantidproject / mantid

Main repository for Mantid code
https://www.mantidproject.org
GNU General Public License v3.0
210 stars 122 forks source link

LoadNexusProcessed Is Unreliable on macOS #37499

Closed cailafinn closed 3 months ago

cailafinn commented 3 months ago

Describe the bug This defect was introduced by https://github.com/mantidproject/mantid/pull/37273 It is a intermittent error that only happens on macOS. The exact error:

21:08:49 LoadNexusProcessed-[Error] Error in execution of algorithm LoadNexusProcessed:
21:08:49 LoadNexusProcessed-[Error] Attempt to load from an empty dataset /mantid_workspace_1/peaks_workspace/column_14
21:08:49 libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempt to load from an empty dataset /mantid_workspace_1/peaks_workspace/column_14
21:08:49 Subprocess aborted

To Reproduce

  1. Occurs occasionally on our macOS runners. Need to find a reliable way to reproduce locally.

Expected behavior Test should always pass if it passes most of the time.

Screenshots

Platform/Version (please complete the following information):

Additional context

cailafinn commented 3 months ago

This isn't actually an unreliable test. Running it twice in a script will crash on macOS, the test just revealed the failure. Tested on the nightly, mantid 6.9.1 and mantid 6.7. All crash.

It only seems to happen with the SingleCrystalLeanElasticPeakTable.nxs file, so it's now a question of working out what's wrong with the file or wrong with the loading process for those types of files.

When testing, the crash would often occur during the next test that's run. Interestingly, I think this is due to the original load of the file corrupting something on the heap, then the second load identifies the problem and crashes out. Guess based on this error message:

python3.10(16416,0x70000f855000) malloc: Heap corruption detected, free list is damaged at 0x600002a851e0
*** Incorrect guard value: 0
python3.10(16416,0x70000f855000) malloc: *** set a breakpoint in malloc_error_break to debug

Actual crash when using the breakpoint at malloc_error_break seems to occur in two places, inconsistently. Why this happens for this specific file, I'm not sure. https://github.com/mantidproject/mantid/blob/647fe86a9e44e9e6b474999ec67f63200ad96eb5/Framework/Nexus/src/NexusClasses.cpp#L523 https://github.com/mantidproject/mantid/blob/647fe86a9e44e9e6b474999ec67f63200ad96eb5/Framework/Nexus/src/NexusClasses.cpp#L526

zjmorgan commented 3 months ago

Interesting. Thanks for tracking it down. Yes, I added the file. The context was to preserve the peak shape when saving/loading lean peaks workspaces.

zjmorgan commented 3 months ago

The file was created loading SingleCrystalPeakTable.nxs and converting to LeanElasticPeak

github-actions[bot] commented 3 months ago

Closed by #37515.