root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.68k stars 1.27k forks source link

Fatal error during std::stable sort of TTreeReaderArray #13314

Closed tdietel closed 6 months ago

tdietel commented 1 year ago

Check duplicate issues.

Description

I tried to run std::stable_sort on a TTreeReaderArrray. The sorting works fine on macOS, but a colleague tried to run the same code on Ubuntu and reported a crash. I then managed to reproduce the crash on CentOS 7.

I investigated the error, and it seems that std::stable_sort assumes that it can decrement and invalid pointer to get a pointer to the last array of an element. However, TTreeReaderArray invalidates the iterator by setting fArray=0 when fIndex is incremented to fIndex>=fSize. This means that the information about the associated array is lost, and the subsequent decrement and dereference will fail. I would assume this could be fixed by marking an invalid iterator with a special value for fIndex (instead of setting fArray=0).

So my question would be if TTreeReaderArray is supposed to be sortable, or if I am trying something that's outside of the scope of the implementation?

I know my reproducer requires ALICE O2.

Reproducer

I encountered the bug with a tree that holds classes from the ALICE O2 framework. I wrote a minimal macro sortbug.C that fails when run in compiled mode in an O2 root environment. Data files are available from https://cernbox.cern.ch/s/ypjoEC6awvYpjwR. The command is root sortbug.C+. The macro content is:

#if !defined(__CLING__) || defined(__ROOTCLING__)

#include <iostream>
//std::cout << "including header files" << std::endl;

#include <DataFormatsTRD/Digit.h>
#include <DataFormatsTRD/Tracklet64.h>

#include <TFile.h>
#include <TTreeReader.h>
#include <TTreeReaderArray.h>

#endif

/// comparison function to order digits by det / row / MCM / -channel
bool comp_digit(const o2::trd::Digit& a, const o2::trd::Digit& b)
{
  if (a.getDetector() != b.getDetector())
    return a.getDetector() < b.getDetector();

  if (a.getPadRow() != b.getPadRow())
    return a.getPadRow() < b.getPadRow();

  if (a.getROB() != b.getROB())
    return a.getROB() < b.getROB();

  if (a.getMCM() != b.getMCM())
    return a.getMCM() < b.getMCM();

  // sort channels in descending order, to ensure ordering of pad columns
  if (a.getChannel() != b.getChannel())
    return a.getChannel() > b.getChannel();

  return true;
}
bool comp_tracklet(o2::trd::Tracklet64 a, o2::trd::Tracklet64 b)
{
  cout << "COMPARE tracklets: " << endl << a << endl << b << endl;

  // upper bits of hcid and padrow from Tracklet64 word
  const uint64_t det_row_mask = 0x0ffde00000000000;

  // lowest bit of hcid (side), MCM col and pos from Tracklet64 word
  const uint64_t col_pos_mask = 0x00011fff00000000;

  auto a_det_row = a.getTrackletWord() & det_row_mask;
  auto b_det_row = b.getTrackletWord() & det_row_mask;

  if (a_det_row != b_det_row)
    return a_det_row < b_det_row;

  auto a_col_pos = a.getTrackletWord() & col_pos_mask;
  auto b_col_pos = b.getTrackletWord() & col_pos_mask;

  return a_col_pos < b_col_pos;
};
void sortbug()
{

  TFile* mainFile = new TFile("trdtracklets.root");
  TTree* dataTree = 0;
  mainFile->GetObject("o2sim", dataTree);
  TTreeReader* dataReader = new TTreeReader(dataTree);

  // set up the branches we want to read
  TTreeReaderArray<o2::trd::Tracklet64>* tracklets = new TTreeReaderArray<o2::trd::Tracklet64>(*dataReader, "Tracklet");

  dataTree->AddFriend("o2sim", "trddigits.root");
  TTreeReaderArray<o2::trd::Digit>* digits = new TTreeReaderArray<o2::trd::Digit>(*dataReader, "TRDDigit");

  int tfno = 0;
  while(dataReader->Next()) {
    cout << "Next TF: " << tfno << endl;

    for (auto tracklet : *tracklets) {
      cout << tracklet << endl;
    }
    cout << dec << "Start to sort " << std::distance(tracklets->begin(), tracklets->end()) << " tracklets..." << endl;
    std::stable_sort(tracklets->begin(), tracklets->end(), comp_tracklet);
    cout << "Start to sort " << std::distance(digits->begin(), digits->end()) << " digits..." << endl;
    std::stable_sort(digits->begin(), digits->end(), comp_digit);
  }

}

The code creates the following stack trace:

Fatal: fArray && "invalid iterator!" violated at line 118 of `/scratch/tdietel/alisw/slc7_x86-64/ROOT/v6-28-04-14/include/TTreeReaderArray.h'
aborting
#0  0x00007fa77058860c in waitpid () from /lib64/libc.so.6
#1  0x00007fa770505f62 in do_system () from /lib64/libc.so.6
#2  0x00007fa7712b7b8c in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x165c5b0) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/unix/src/TUnixSystem.cxx:2104
#3  TUnixSystem::StackTrace (this=0x165c5b0) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/unix/src/TUnixSystem.cxx:2395
#4  0x00007fa771198e69 in DefaultErrorHandler (level=<optimized out>, abort_bool=<optimized out>, location=<optimized out>, msg=<optimized out>) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/base/src/TErrorDefaultHandler.cxx:177
#5  0x00007fa77124a7d5 in ErrorHandler(Int_t, const char *, const char *, typedef __va_list_tag __va_list_tag *) (level=6000, location=0x7fa750f2eabe "", fmt=<optimized out>, ap=0x7fff0771d888) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/foundation/src/TError.cxx:153
#6  0x00007fa77124b302 in Fatal (location=<optimized out>, fmt=<optimized out>) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/foundation/src/TError.cxx:251
#7  0x00007fa750f2a19e in void std::__merge_adaptive<TTreeReaderArray<o2::trd::Tracklet64>::Iterator_t<TTreeReaderArray<o2::trd::Tracklet64> >, long, o2::trd::Tracklet64*, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(o2::trd::Tracklet64, o2::trd::Tracklet64)> >(TTreeReaderArray<o2::trd::Tracklet64>::Iterator_t<TTreeReaderArray<o2::trd::Tracklet64> >, TTreeReaderArray<o2::trd::Tracklet64>::Iterator_t<TTreeReaderArray<o2::trd::Tracklet64> >, TTreeReaderArray<o2::trd::Tracklet64>::Iterator_t<TTreeReaderArray<o2::trd::Tracklet64> >, long, long, o2::trd::Tracklet64*, long, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(o2::trd::Tracklet64, o2::trd::Tracklet64)>) () from /home/tdietel/alice/data/ole/sortbug_C.so
#8  0x00007fa750f27d24 in void std::__stable_sort_adaptive<TTreeReaderArray<o2::trd::Tracklet64>::Iterator_t<TTreeReaderArray<o2::trd::Tracklet64> >, o2::trd::Tracklet64*, long, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(o2::trd::Tracklet64, o2::trd::Tracklet64)> >(TTreeReaderArray<o2::trd::Tracklet64>::Iterator_t<TTreeReaderArray<o2::trd::Tracklet64> >, TTreeReaderArray<o2::trd::Tracklet64>::Iterator_t<TTreeReaderArray<o2::trd::Tracklet64> >, o2::trd::Tracklet64*, long, __gnu_cxx::__ops::_Iter_comp_iter<bool (*)(o2::trd::Tracklet64, o2::trd::Tracklet64)>) [clone .constprop.0] () from /home/tdietel/alice/data/ole/sortbug_C.so
#9  0x00007fa750f288f4 in sortbug() () from /home/tdietel/alice/data/ole/sortbug_C.so
#10 0x00007fa7683ff036 in ?? ()
#11 0x00007fff0771e170 in ?? ()
#12 0x00007fa76b0e4bbe in cling::IncrementalExecutor::executeWrapper(llvm::StringRef, cling::Value*) const () from /home/tdietel/alice/sw/slc7_x86-64/ROOT/v6-28-04-14/lib/libCling.so
#13 0x00007fa76b066cda in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) () from /home/tdietel/alice/sw/slc7_x86-64/ROOT/v6-28-04-14/lib/libCling.so
#14 0x00007fa76b06739f in cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) () from /home/tdietel/alice/sw/slc7_x86-64/ROOT/v6-28-04-14/lib/libCling.so
#15 0x00007fa76b0675ea in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) () from /home/tdietel/alice/sw/slc7_x86-64/ROOT/v6-28-04-14/lib/libCling.so
#16 0x00007fa76b142c97 in cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) () from /home/tdietel/alice/sw/slc7_x86-64/ROOT/v6-28-04-14/lib/libCling.so
#17 0x00007fa76af8a0bc in HandleInterpreterException (metaProcessor=0x243df60, input_line=0x7fff0771e079 "sortbug()", compRes=
0x7fff0771e03c: cling::Interpreter::kSuccess, result=0x7fff0771e170) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/metacling/src/TCling.cxx:2460
#18 0x00007fa76afa4749 in TCling::ProcessLine (this=0x16e7bc0, line=<optimized out>, error=0x7fff0771f04c) at /jenkins/workspace/build-any-ib/sw/20156516
/1/slc7_x86-64/GCC-Toolchain/v12.2.0-alice1-5/include/c++/12.2.0/bits/unique_ptr.h:191
#19 0x00007fa76afa4b02 in TCling::ProcessLineSynch (this=0x16e7bc0, line=0x16e8930 ".X  /home/tdietel/alice/data/ole/./sortbug.C+", error=0x7fff0771f04c)
 at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/metacling/src/TCling.cxx:3554
#20 0x00007fa7711777e7 in TApplication::ExecuteFile (file=<optimized out>, error=0x7fff0771f04c, keep=<optimized out>) at /jenkins/workspace/build-any-ib
/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/base/src/TApplication.cxx:1661
#21 0x00007fa77172040c in TRint::ProcessLineNr (this=0x16b4f60, filestem=<optimized out>, line=0x7fff0771f060 ".x sortbug.C+", error=0x7fff0771f04c) at /
jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/rint/src/TRint.cxx:820
#22 0x00007fa771721fc6 in TRint::Run (this=this
entry=0x16b4f60, retrn=retrn
entry=false) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-04/core/rint/src/TRint.cxx:461
#23 0x00000000004011dd in main (argc=<optimized out>, argv=0x7fff077211e8) at /jenkins/workspace/build-any-ib/sw/20156516/1/SOURCES/ROOT/v6-28-04/v6-28-0
4/main/src/rmain.cxx:84

I assume the bug can be triggered with other data sets in plain ROOT. If needed, I could try to strip things down, but I don't know if this is the most efficient use of my time.

ROOT version

   ------------------------------------------------------------------
  | Welcome to ROOT 6.28/04                        https://root.cern |
  | (c) 1995-2022, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jul 10 2023, 10:38:00                 |
  | From tags/v6-28-04@v6-28-04                                      |
  | With c++ (GCC) 12.2.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

Installation method

Built with alibuild

Operating system

CentOS 7, Ubuntu

Additional context

No response

guitargeek commented 1 year ago

Hi! I think @Axel-Naumann is able to tell authoritatively if this is within the scope of the TTreeReaderArray or not.

I think it's not. You'll have to copy the elements to an STL collection and then sort the new collection.

Axel-Naumann commented 1 year ago

Thanks for the report!

I would love to mark the TTreeReaderArray iterators const but that will break the code of people who don't mark their member functions const... So indeed while sorting on the iterators (and manipulation of the buffer TTreeReaderArray operates on) is not supported I don't know how to signal that.

Any ideas?

dpiparo commented 1 year ago

I thought about this and unfortunately I did not have any good idea yet. However, if the problem to be solved is sorting the entries in TTreeReaderArrays without additional copies, a solution could be working on indices (apologies if this is not fitting this particular use case in ALICE). For example

template <class MyCollection>
std::vector<size_t> ArgStableSort(const MyCollection &mycoll)
{
   std::vector<size_t> sortedIndices(mycoll.size());
   std::iota(sortedIndices.begin(),sortedIndices.end(),0);
   std::stable_sort( sortedIndices.begin(), sortedIndices.end(), [&](int i,int j){return mycoll[i]<mycoll[j];}); 
   return sortedIndices;
}
// [....]
std::vector<char> myvec{'c', 'e', 'd', 'a', 'b', 'f'};
const auto myvec_sidx = ArgStableSort(myvec);
for (size_t idx : myvec_sidx) {
   std::cout << myvec[idx] << std::endl;
}

This is an approach used in many libraries, including ROOT's VecOps

guitargeek commented 6 months ago

That's a nice demo, @dpiparo, and how it's commonly done. Or you can just copy the collection, if it's not too expensive.

As for the actual issue: it should be clear that reader views should not be sortable. We can't make this explicitly clear with const because ROOT and it's users are not very consistent with that keyword, as @Axel-Naumann said. So there is no solution here - ROOT and C++ just allow you to shoot yourself in the foot :slightly_smiling_face: