root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.64k stars 1.26k forks source link

Broken streaming of vector of enum with underlying type other than int #16312

Open ktf opened 3 weeks ago

ktf commented 3 weeks ago

Check duplicate issues.

Description

I need help to understand an issue which we have when running on Linux on ARM when reading a file which was serialised on x86. Notice that this platform is peculiar, because char (without specifier) is unsigned, and not signed (char sign-ess is implementation detail in the standard).

This is important because mPadSubset that you will see below is an enum PadSubset : char. Running in valgrind, the issue appears as dumped below.

What puzzles me and what I think is the culprit of the segmentation fault is the line:

[1965517:tpc-tracker]:    i= 2, mPadSubset      type= 23, offset= 56, len=2, method=0 [optimized]

as I would have expected it to be len=1. Can you explain me what is going on?

[1965517:tpc-tracker]: ====>Rebuilding TStreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version: 1
[1965517:tpc-tracker]: Creating StreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version: 2
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: StreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version=2, checksum=0x93700773
[1965517:tpc-tracker]:   string         mName           offset=  0 type=300 ,stl=365, ctype=365, name of the object
[1965517:tpc-tracker]:   vector<o2::tpc::CalArray<o2::tpc::PadFlags> > mData           offset= 32 type=300 ,stl=1, ctype=61, internal CalArrays
[1965517:tpc-tracker]:   o2::tpc::PadSubset mPadSubset      offset= 56 type= 3 Pad subset granularity
[1965517:tpc-tracker]:    i= 0, mName           type=300, offset=  0, len=1, method=0
[1965517:tpc-tracker]:    i= 1, mData           type=300, offset= 32, len=1, method=0
[1965517:tpc-tracker]:    i= 2, mPadSubset      type=  3, offset= 56, len=1, method=0
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: StreamerInfo for class: o2::tpc::CalDet<o2::tpc::PadFlags>, version=1, checksum=0x93700773
[1965517:tpc-tracker]:   string         mName           offset=  0 type=300 ,stl=365, ctype=365, name of the object
[1965517:tpc-tracker]:   vector<o2::tpc::CalArray<o2::tpc::PadFlags> > mData           offset= 32 type=300 ,stl=1, ctype=61, internal CalArrays
[1965517:tpc-tracker]:   o2::tpc::PadSubset mPadSubset      offset= 56 type= 3 Pad subset granularity
[1965517:tpc-tracker]:    i= 0, mName           type=300, offset=  0, len=1, method=0
[1965517:tpc-tracker]:    i= 1, mData           type=300, offset= 32, len=1, method=0
[1965517:tpc-tracker]:    i= 2, mPadSubset      type=  3, offset= 56, len=1, method=0
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: ====>Rebuilding TStreamerInfo for class: o2::tpc::CalArray<o2::tpc::PadFlags>, version: 1
[1965517:tpc-tracker]:
[1965517:tpc-tracker]: StreamerInfo for class: o2::tpc::CalArray<o2::tpc::PadFlags>, version=1, checksum=0xb03d18c2
[1965517:tpc-tracker]:   string         mName           offset=  0 type=300 ,stl=365, ctype=365,
[1965517:tpc-tracker]:   vector<o2::tpc::PadFlags> mData           offset= 32 type=300 ,stl=1, ctype=3, calibration data
[1965517:tpc-tracker]:   o2::tpc::PadSubset mPadSubset      offset= 56 type= 3 Subset type
[1965517:tpc-tracker]:   int            mPadSubsetNumber offset= 60 type= 3 Number of the pad subset, e.g. ROC 0 is IROC A00
[1965517:tpc-tracker]:    i= 0, mName           type=300, offset=  0, len=1, method=0
[1965517:tpc-tracker]:    i= 1, mData           type=300, offset= 32, len=1, method=0
[1965517:tpc-tracker]:    i= 2, mPadSubset      type= 23, offset= 56, len=2, method=0 [optimized]
[1965517:tpc-tracker]: ==1965517== Invalid write of size 1
[1965517:tpc-tracker]: ==1965517==    at 0xF36E7A0: frombuf (Bytes.h:313)
[1965517:tpc-tracker]: ==1965517==    by 0xF36E7A0: frombuf (Bytes.h:442)
[1965517:tpc-tracker]: ==1965517==    by 0xF36E7A0: ReadFastArray (TBufferFile.cxx:1338)
[1965517:tpc-tracker]: ==1965517==    by 0xF36E7A0: TBufferFile::ReadFastArray(int*, int) (TBufferFile.cxx:1327)
[1965517:tpc-tracker]: ==1965517==    by 0xF3E580B: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1183)
[1965517:tpc-tracker]: ==1965517==    by 0xF36EC7B: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517==    by 0xF36EC7B: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1616)
[1965517:tpc-tracker]: ==1965517==    by 0xF58C84B: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1297)
[1965517:tpc-tracker]: ==1965517==    by 0xF45B81F: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1883)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DAAB: operator() (TStreamerInfoActions.h:131)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DAAB: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) (TBufferFile.cxx:3736)
[1965517:tpc-tracker]: ==1965517==    by 0xF482A0F: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short) (TStreamerInfoActions.cxx:1155)
[1965517:tpc-tracker]: ==1965517==    by 0xF482C4F: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1405)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: operator() (TStreamerInfoActions.h:123)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: ApplySequence (TBufferFile.cxx:3670)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (TBufferFile.cxx:3661)
[1965517:tpc-tracker]: ==1965517==    by 0xF376CEB: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) (TBufferFile.cxx:3598)
[1965517:tpc-tracker]: ==1965517==    by 0xF3F4633: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517==    by 0xF3F4633: TKey::ReadObjectAny(TClass const*) (TKey.cxx:1120)
[1965517:tpc-tracker]: ==1965517==    by 0xF3B82E3: TDirectoryFile::GetObjectChecked(char const*, TClass const*) (TDirectoryFile.cxx:1111)
[1965517:tpc-tracker]: ==1965517==  Address 0x153fbb80 is 0 bytes after a block of size 1,440 alloc'd
[1965517:tpc-tracker]: ==1965517==    at 0x4868908: operator new(unsigned long) (vg_replace_malloc.c:483)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: allocate (new_allocator.h:137)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: allocate (allocator.h:188)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: allocate (alloc_traits.h:464)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: _M_allocate (stl_vector.h:378)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: _M_allocate (stl_vector.h:375)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: std::vector<o2::tpc::PadFlags, std::allocator<o2::tpc::PadFlags> >::_M_default_append(unsigned long) (vector.tcc:650)
[1965517:tpc-tracker]: ==1965517==    by 0xF3E5797: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1176)
[1965517:tpc-tracker]: ==1965517==    by 0xF36EC7B: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517==    by 0xF36EC7B: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1616)
[1965517:tpc-tracker]: ==1965517==    by 0xF58C84B: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1297)
[1965517:tpc-tracker]: ==1965517==    by 0xF45B81F: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1883)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DAAB: operator() (TStreamerInfoActions.h:131)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DAAB: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) (TBufferFile.cxx:3736)
[1965517:tpc-tracker]: ==1965517==    by 0xF482A0F: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short) (TStreamerInfoActions.cxx:1155)
[1965517:tpc-tracker]: ==1965517==    by 0xF482C4F: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1405)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: operator() (TStreamerInfoActions.h:123)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: ApplySequence (TBufferFile.cxx:3670)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (TBufferFile.cxx:3661)
[1965517:tpc-tracker]: ==1965517==    by 0xF376CEB: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) (TBufferFile.cxx:3598)
[1965517:tpc-tracker]: ==1965517==    by 0xF3F4633: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517==    by 0xF3F4633: TKey::ReadObjectAny(TClass const*) (TKey.cxx:1120)
[1965517:tpc-tracker]: ==1965517==
[1965517:tpc-tracker]: ==1965517== Invalid write of size 1
[1965517:tpc-tracker]: ==1965517==    at 0xF36E7AC: frombuf (Bytes.h:314)
[1965517:tpc-tracker]: ==1965517==    by 0xF36E7AC: frombuf (Bytes.h:442)
[1965517:tpc-tracker]: ==1965517==    by 0xF36E7AC: ReadFastArray (TBufferFile.cxx:1338)
[1965517:tpc-tracker]: ==1965517==    by 0xF36E7AC: TBufferFile::ReadFastArray(int*, int) (TBufferFile.cxx:1327)
[1965517:tpc-tracker]: ==1965517==    by 0xF3E580B: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1183)
[1965517:tpc-tracker]: ==1965517==    by 0xF36EC7B: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517==    by 0xF36EC7B: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) (TBufferFile.cxx:1616)
[1965517:tpc-tracker]: ==1965517==    by 0xF58C84B: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) (TStreamerInfoReadBuffer.cxx:1297)
[1965517:tpc-tracker]: ==1965517==    by 0xF45B81F: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1883)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DAAB: operator() (TStreamerInfoActions.h:131)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DAAB: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) (TBufferFile.cxx:3736)
[1965517:tpc-tracker]: ==1965517==    by 0xF482A0F: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short) (TStreamerInfoActions.cxx:1155)
[1965517:tpc-tracker]: ==1965517==    by 0xF482C4F: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) (TStreamerInfoActions.cxx:1405)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: operator() (TStreamerInfoActions.h:123)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: ApplySequence (TBufferFile.cxx:3670)
[1965517:tpc-tracker]: ==1965517==    by 0xF36DE4B: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) (TBufferFile.cxx:3661)
[1965517:tpc-tracker]: ==1965517==    by 0xF376CEB: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) (TBufferFile.cxx:3598)
[1965517:tpc-tracker]: ==1965517==    by 0xF3F4633: Streamer (TClass.h:614)
[1965517:tpc-tracker]: ==1965517==    by 0xF3F4633: TKey::ReadObjectAny(TClass const*) (TKey.cxx:1120)
[1965517:tpc-tracker]: ==1965517==    by 0xF3B82E3: TDirectoryFile::GetObjectChecked(char const*, TClass const*) (TDirectoryFile.cxx:1111)
[1965517:tpc-tracker]: ==1965517==  Address 0x153fbb81 is 1 bytes after a block of size 1,440 alloc'd
[1965517:tpc-tracker]: ==1965517==    at 0x4868908: operator new(unsigned long) (vg_replace_malloc.c:483)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: allocate (new_allocator.h:137)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: allocate (allocator.h:188)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: allocate (alloc_traits.h:464)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: _M_allocate (stl_vector.h:378)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: _M_allocate (stl_vector.h:375)
[1965517:tpc-tracker]: ==1965517==    by 0x60E5D1F: std::vector<o2::tpc::PadFlags, std::allocator<o2::tpc::PadFlags> >::_M_default_append(unsigned long) (vector.tcc:650)
[1965517:tpc-tracker]: ==1965517==    by 0xF3E5797: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*) (TGenCollectionStreamer.cxx:1176)
[1965517:tpc-tracker]: ==1965517==    by 0xF36EC7B: Streamer (TClass.h:614)

Reproducer

I do not have one which does not involve running ALICE reconstruction on ARM.

ROOT version

6.32.02.

Installation method

aliBuild

Operating system

ALMA Linux 9 on ARM64 (Ampere Altra)

Additional context

No response

jblomer commented 3 weeks ago

Can you give us a bit more information? What would be useful, if possible:

Is it confirmed that the same data serialized on ARM does not cause a crash?

ktf commented 3 weeks ago

For the file:

https://cernbox.cern.ch/s/MXkLwJLm61rckhj

I cannot confirm if the same data serialised on ARM does not cause a crash.

ktf commented 3 weeks ago
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: handle_crash(int)
[1064949:tpc-tracker]:     linux-vdso.so.1:     ?? ??:0
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ReadFastArray(int*, int)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: void TGenCollectionStreamer::ReadBufferVectorPrimitives<int>(TBuffer&, void*, TClass const*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TStreamerInfoActions::ReadSTLMemberWiseSameClass(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*, short)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TKey::ReadObjectAny(TClass const*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TDirectoryFile::GetObjectChecked(char const*, TClass const*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataRefUtils::decodeCCDB(o2::framework::DataRef const&, std::type_info const&)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: decltype(auto) o2::framework::InputRecord::get<o2::tpc::CalDet<o2::tpc::PadFlags>*, char const*>(char const*, int) const
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: bool o2::gpu::GPURecoWorkflowSpec::fetchCalibsCCDBTPC<o2::gpu::GPUCalibObjectsTemplate<o2::gpu::ConstPtr> >(o2::framework::ProcessingContext&, o2::gpu::GPUCalibObjectsTemplate<o2::gpu::ConstPtr>&, o2::gpu::GPURecoWorkflowSpec::calibObjectStruct&)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: o2::gpu::GPURecoWorkflowSpec::doCalibUpdates(o2::framework::ProcessingContext&, o2::gpu::GPURecoWorkflowSpec::calibObjectStruct&)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2GPUWorkflow.so: o2::gpu::GPURecoWorkflowSpec::run(o2::framework::ProcessingContext&)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so:     ?? ??:0
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::tryDispatchComputation(o2::framework::ServiceRegistryRef, std::vector<o2::framework::DataRelayer::RecordAction, std::allocator<o2::framework::DataRelayer::RecordAction> >&)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::doRun(o2::framework::ServiceRegistryRef)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::run_callback(uv_work_s*)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::Run()
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/FairMQ/v1.8.4-2/lib/libfairmq.so.1.8.4: fair::mq::Device::RunWrapper()
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/FairMQ/v1.8.4-2/lib/libfairmq.so.1.8.4: boost::detail::function::void_function_obj_invoker1<std::function<void (fair::mq::State)>, void, fair::mq::State>::invoke(boost::detail::function::function_buffer&, fair::mq::State)
[1064949:tpc-tracker]:     /root/src/sw/slc9_aarch64/FairMQ/v1.8.4-2/lib/libfairmq.so.1.8.4: boost::signals2::detail::signal_impl<void (fair::mq::State), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (fair::mq::State)>, boost::function<void (boost::signals2::connection const&, fair::mq::State)>, boost::signals2::mutex>::operator()(fair::mq::State)

is one of the stacktraces. It actually dies in different ways, most likely there is some memory corruption going on...

ktf commented 3 weeks ago

For the ALICE environment, the easiest is probably sitting together. It's on a custom machine in my private area.

jblomer commented 3 weeks ago

Thanks. I'm not at CERN today but getting started with the information.

jblomer commented 3 weeks ago

(Side note: MakeProject does not reconstruct the enums with the correct underlying type)

ktf commented 3 weeks ago

Another stacktrace which seems to be related to this is:

[1500611:internal-dpl-ccdb-backend]: Executable is /root/src/sw/slc9_aarch64/O2/dev-local1/bin/o2-tpc-reco-workflow
[1500611:internal-dpl-ccdb-backend]:     linux-vdso.so.1:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     [0xfff3cae9b014]:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     [0xfff3cae9d7f0]:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so:     ?? ??:0
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: TCling::AutoParseImplRecurse(char const*, bool)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: TCling::AutoParse(char const*)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: TClingLookupHelper__AutoParse(char const*)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCling.so: ROOT::TMetaUtils::TClingLookupHelper::GetPartiallyDesugaredNameWithScopeHandling(std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCore.so.6.32: TClassEdit::GetNormalizedName(std::__cxx11::basic_string<char, std::char_traits<char>, std:
:allocator<char> >&, std::basic_string_view<char, std::char_traits<char> >)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libCore.so.6.32: TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TStreamerInfo::BuildCheck(TFile*, bool)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TFile::ReadStreamerInfo()
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TFile::Init(bool)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/ROOT/v6-32-02-alice1-1/lib/libRIO.so.6.32: TMemFile::TMemFile(char const*, char*, long long, char const*, char const*, int, long long)
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::loadFileToMemory(std::vector<char, boost::container::pmr::polymorphic_allocator<char
> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basi
c_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_s
tring<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >*) const
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::getFromSnapshot(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> > const&, long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
 std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > con
st, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<char, boost::con
tainer::pmr::polymorphic_allocator<char> >&, int&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::navigateSourcesAndLoadFile(o2::ccdb::CcdbApi::RequestContext&, int&, unsigned long*)
 const
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::vectoredLoadFileToMemory(std::vector<o2::ccdb::CcdbApi::RequestContext, std::allocat
or<o2::ccdb::CcdbApi::RequestContext> >&) const
[1500611:internal-dpl-ccdb-backend]:     /root/src/sw/slc9_aarch64/O2/dev-local1/lib/libO2CCDB.so: o2::ccdb::CcdbApi::loadFileToMemory(std::vector<char, boost::container::pmr::polymorphic_allocator<char
> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::bas$
ktf commented 3 weeks ago

Interestingly enough, the actual array returned by backtrace can be decoded by GDB to:

$4 = {0xffffac196fb0 <handle_crash(int)+48>, 0xffffb2f727f0 <__kernel_rt_sigreturn>, 0xfff3ea6f5014, 0xfff3ea6f77f0,
  0xffff9e97b198 <(anonymous namespace)::GenericLLVMIRPlatformSupport::initialize(llvm::orc::JITDylib&)+2392>,
  0xffff9d4b0de0 <cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&)+272>, 0xffff9d435f78 <cling::Interpreter::executeTransaction(cling::Transaction&)+40>,
  0xffff9d4c0e30 <cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&, bool)+768>,
  0xffff9d4c398c <cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&)+108>,
  0xffff9d433d80 <cling::Interpreter::parseForModule(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+176>, 0xffff9d36b5f8
     <ExecAutoParse(char const*, Bool_t, cling::Interpreter*)+568>, 0xffff9d36cf48 <TCling::AutoParseImplRecurse(char const*, bool)+1400>, 0xffff9d374de4 <TCling::AutoParse(char const*)+340>,
  0xffff9d355204 <TClingLookupHelper__AutoParse(char const*)+36>, 0xffff9d2c8b44
     <ROOT::TMetaUtils::TClingLookupHelper::GetPartiallyDesugaredNameWithScopeHandling(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool)+116>, 0xffffa7acf42c
     <TClassEdit::GetNormalizedName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::basic_string_view<char, std::char_traits<char> >)+540>, 0xffffa7aeab58
     <TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long)+1144>, 0xffffa7f852b4 <TStreamerInfo::BuildCheck(TFile*, bool)+148>, 0xffffa7f4751c <TFile::ReadStreamerInfo()+700>,
  0xffffa7f4fc40 <TFile::Init(bool)+1056>, 0xffffa7f74a60 <TMemFile::TMemFile(char const*, char*, long long, char const*, char const*, int, long long)+268>, 0xffffac4515b4
     <o2::ccdb::CcdbApi::loadFileToMemory(std::vector<char, boost::container::pmr::polymorphic_allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >*) const+900>,
  0xffffac451f68 <o2::ccdb::CcdbApi::getFromSnapshot(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<char, boost::container::pmr::polymorphic_allocator<char> >&, int&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+936>,
  0xffffac452100 <o2::ccdb::CcdbApi::navigateSourcesAndLoadFile(o2::ccdb::CcdbApi::RequestContext&, int&, unsigned long*) const+192>,
  0xffffac4524d0 <o2::ccdb::CcdbApi::vectoredLoadFileToMemory(std::vector<o2::ccdb::CcdbApi::RequestContext, std::allocator<o2::ccdb::CcdbApi::RequestContext> >&) const+240>,
jblomer commented 3 weeks ago

Some more points gathered during a debug session:

jblomer commented 3 weeks ago

Further debugging revealed a deeper issue that seem to only by chance surface on ARM/Linux:

Writing or reading a vector of enums goes through the collection proxy. The collection proxy will use WriteFastArray / ReadFastArray of kInt_t, neglecting the actual underlying type of the enum. At some point in the read/write chain, this causes memory reads/writes beyond the limits of a memory array.

jblomer commented 3 weeks ago

I think the cause is https://github.com/root-project/root/blob/master/io/io/src/TGenCollectionProxy.cxx#L404 (and similar lines further down), that hard-code the enum underlying type to int.

When fixing, I think we need to take care of what happens to files already written out with the wrong enum width.

ktf commented 3 weeks ago

Do I understand correctly this affects only scoped enums within a vector? Can I simply fix it on my side by moving to enum class Foo : int {}?

jblomer commented 3 weeks ago

Although: I'm not exactly sure if already existing files that were serialized with a shorter enum correctly read back. I think yes, but that needs to be tested.

ktf commented 3 weeks ago

Although: I'm not exactly sure if already existing files that were serialized with a shorter enum correctly read back. I think yes, but that needs to be tested.

This I can try on my side.

jblomer commented 3 weeks ago

I'm attaching a minimal reproducer.

minimalTestVectorOfEnums.tar.gz

This test returns (wrongly)

Size of PadFlags: 2
Enum underlying type: 12
mFlags size before writing: 2
mFlags size after reading: 4
0 0 23824 0

With a patch to TGenCollectionProxy::Value, the result is correct:

Size of PadFlags: 2
Enum underlying type: 12
mFlags size before writing: 2
mFlags size after reading: 2
0 0

I think the next steps should be discussed with @pcanal. In particular:

jblomer commented 3 weeks ago

AFAICT, neither TTree nor RNTuple I/O are affected by this issue.

pcanal commented 2 weeks ago

[1965517:tpc-tracker]: i= 2, mPadSubset type= 23, offset= 56, len=2, method=0 [optimized] as I would have expected it to be len=1. Can you explain me what is going on?

If the next data member (which should not be listed right after it) is of the same type, TStreamerInfo will collate them (note the optimized part).

pcanal commented 2 weeks ago

We shall be able to fix the usage in regular I/O and TTree (which is also broken) when using dictionary. The proper support in bare ROOT might be harder (the underlying size information is a bit harder to find and in some case might not be (yet?) available (top level vector of enums)).

pcanal commented 2 weeks ago

In general, how do we correctly handle vectors of enums with underlying types different than int that are on disk, before and after the patch?

With dictionaries, it seems to work fine (for embedded vectors probably not for standalone vector) because the TStreamerInfo of the containing class records the underlying type and thus know when a conversion is needed (The corollary is that a class version number must be updated (to allow schema evolution) if one of the enums type it uses changes its underlying type).

ktf commented 2 weeks ago

For the record, as you might have seen in https://github.com/AliceO2Group/AliceO2/pull/13464, simply changing the types breaks reading back old files (i.e. two shorts are read in an int). Could you comment when do you expect to have a fix for this on your side which applies to 6.32.2 and if it will allow old code to still read new data (and viceversa new code / old data)?

pcanal commented 2 weeks ago

Side note for the record, the original valgrind report and crash happens in the case where the vector<EnumType> is itself held in a vector (of CalArray) held into an object (CalDet).

I have a workaround that solves the problem for the case in the minimal reproducer which resolves around setting a read rule for the vector of enums:

template <typename E>
void LoadEnumCollection(/* const */ std::vector<E> &onfile, std::vector<E> &enums)
{
   constexpr size_t delta = sizeof(int)/sizeof(E);
   const size_t nvalues = onfile.size() / delta;
   onfile.resize(nvalues);
   std::swap(onfile, enums);
};
#pragma read sourceClass="Event" checksums="[0xa2558fd6]" targetClass="Event" source="std::vector<PadFlags> mFlags" target="mFlags" code="{ LoadEnumCollection(onfile.mFlags, mFlags); }"

However it does not work yet for the actual/original problem :(. (In the minimal reproducer the size of the container is double what it should be has no over-write/crash, while in the original the container ends up with the right size but with an over-write and thus crash).

pcanal commented 2 weeks ago

The following custom Streamer works around the issue:

template <typename Flags>
inline void CalArray<Flags>::Streamer(TBuffer &R__b)
{
   // Stream an object of class CalArray<PadFlags>.

   if (R__b.IsReading()) {
      UInt_t R__s, R__c;
      Version_t R__v = R__b.ReadVersion(&R__s, &R__c);
      if (R__v <= 3) {
         {
            UInt_t start, count;
            Version_t vers = R__b.ReadVersion(&start, &count);

            std::vector<int> R__stl;
            R__stl.clear();
            int R__n;
            R__b >> R__n;
            R__stl.reserve(R__n);
            for (int R__i = 0; R__i < R__n; R__i++) {
               Int_t readtemp;
               R__b >> readtemp;
               R__stl.push_back(readtemp);
            }
            R__b.CheckByteCount(start, count, "stl collection of enums");

            mFlags.clear();
            auto data = reinterpret_cast<unsigned short*>(R__stl.data());
            constexpr size_t delta = sizeof(int)/sizeof(Flags);
            for(int i = 0; i < R__n; ++i)
               mFlags.push_back(static_cast<PadFlags>( data[i] ));
         }
         int tmp;
         R__b >> tmp;
         mPadSubset = static_cast<PadSubset>(tmp);

         R__b.CheckByteCount(R__s, R__c, CalArray::IsA());
      } else {
         R__b.ReadClassBuffer(CalArray<Flags>::Class(),this, R__v, R__s, R__c);
      }
   } else {
      R__b.WriteClassBuffer(CalArray<Flags>::Class(),this);
   }
}

[Call to ReadClassBuffer was corrected to add missing parameters]