root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.53k stars 1.24k forks source link

Compatiblity issue: File writting with root 6.32/02 cannot be read back with root 6.10/06 #15964

Open wlampl opened 5 days ago

wlampl commented 5 days ago

Check duplicate issues.

Description

While trying to update to LCG_106_ATLAS_3 (root 6.32/02) we encountered a test failure. An intermediate file produce with this release could not be read back with an older release (6.10/06, 6.08.06), we encounter a segfault when the file is closed.

Background: ATLAS Trigger simulation of run 2 uses the release that was used for data-taking during run 2.

Reproducer

I copied the intermediate file + reproducer script to /afs/cern.ch/work/w/wlampl/public/ATEAM-1001 The script is quite simple:

from ROOT import TFile
f=TFile.Open("tmp.RDO")
f.ls()
t=f.Get("CollectionTree")
n=t.GetEntries()
for i in range(n):
    s=t.GetEntry(i)
    print(s)
f.Close()

For root versions back to about 6.16.00 it works as expected. Running with 6.08.06 and 6.10.06 (in a centos7 container), I encounter a segfault as the end. A log can be found in /afs/cern.ch/work/w/wlampl/public/ATEAM-1001/log.22.0.0

ROOT version

Writing: 6.32/02 Reading: 6.10/06 or 6.08.06

Installation method

SFT/LCG

Operating system

CentOS7

Additional context

No response

Nowakus commented 5 days ago

Let me add a reproducer where you only need to open the file and try to exit:

% setupATLAS -c centos7 --pwd /afs/cern.ch/work/w/wlampl/public/ATEAM-1001 % asetup Athena,21.0,latest % root -b tmp.RDO

| Welcome to ROOT 6.08/06 http://root.cern.ch | Attaching file tmp.RDO as _file0... Warning in : no dictionary for class ROOT::TIOFeatures is available (TFile *) 0x29cf190 root [1] .q

Break segmentation violation This is the entire stack trace of all threads:

0 0x00007f6cdd6c560c in waitpid () from /lib64/libc.so.6

1 0x00007f6cdd642f62 in do_system () from /lib64/libc.so.6

2 0x00007f6cdecce102 in TUnixSystem::StackTrace() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

3 0x00007f6cdecd061c in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

4

5 0x0000000001209080 in ?? ()

6 0x00007f6cdec52005 in TList::FindObject(TObject const*) const () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

7 0x00007f6cdec5237c in TList::Clear(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

8 0x00007f6cdec50a01 in THashTable::Clear(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

9 0x00007f6cdec504dd in THashList::Clear(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

10 0x00007f6cdec9d1a7 in TListOfDataMembers::Unload() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

11 0x00007f6cdec7f2d0 in TClass::SetUnloaded() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

12 0x00007f6cdec4a574 in ROOT::RemoveClass(char const*) () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

13 0x00007f6cdec9926e in ROOT::TGenericClassInfo::~TGenericClassInfo() () from /cvmfs/atlas-nightlies.cern.ch/repo/sw/21.0_Athena_x86_64-centos7-gcc62-opt/sw/lcg/releases/ROOT/6.08.06-d7e12/x86_64-centos7-gcc62-opt/lib/libCore.so

14 0x00007f6cdd639ce9 in __run_exit_handlers () from /lib64/libc.so.6

jcatmore commented 5 days ago

Hi @martamaja10 ,

thanks for looking at this. We see you've assigned @dpiparo but we understand that he's away for a couple of weeks, and ideally we'd like this to be addressed sooner if possible. Is there someone else in the team who could look at this before?

The problem is, this issue prevents us from using LCG106 and so it holds up several developments.

Thanks!

James

martamaja10 commented 5 days ago

Hi @jcatmore,

sure, I'll find another person in the team to take a look at this ASAP.

Cheers, Marta

pcanal commented 5 days ago

Most likely backporting this commit: https://github.com/root-project/root/commit/08b34d72a800bd48ea4655f17075de0ef3ca72cb will fix the problem.

pcanal commented 5 days ago

See https://github.com/root-project/root/pull/15968 and https://github.com/root-project/root/pull/15969

jblomer commented 5 days ago

This issue is most likely due to a change that inadvertently broke forward compatibility: https://github.com/root-project/root/issues/14793

You should have seen this already with 6.30 though. Is there an explanation why 6.30 did not trigger the error?

There are two ways to proceed (if the issue is what we think it is):

The second option would be useful to run at least once to confirm that we identified the right cause.

Nowakus commented 5 days ago

Is there any drawback in doing SetBit(TFile::k630forwardCompatibility) for every file we produce now?

pcanal commented 5 days ago

Is there any drawback in doing SetBit(TFile::k630forwardCompatibility) for every file we produce now?

The main drawbacks is forgetting to eventually remove it :). The technical drawback is slightly worse and unstable (see for example; https://github.com/root-project/root/issues/12438) compression.

jcatmore commented 5 days ago

You should have seen this already with 6.30 though. Is there an explanation why 6.30 did not trigger the error?

Just to comment about 6.30: we didn't look at this release apart from to do a compilation test, so indeed, most likely the issue is there as well as per your expectation.