key4hep / key4hep-spack

A Spack recipe repository of Key4hep software.
10 stars 23 forks source link

LCFIPlus broken in nightly builds and latest release with CentOS7 #558

Closed BrieucF closed 5 months ago

BrieucF commented 9 months ago

The CLD reconstruction seems to be broken with CentOS 7 but works with the Alma9 flavor.

Reproducer:

git clone https://github.com/key4hep/CLDConfig.git
cd CLDConfig/CLDConfig
source /cvmfs/sw-nightlies.hsf.org/key4hep/setup.sh
ddsim --compactFile $K4GEO/FCCee/CLD/compact/CLD_o2_v05/CLD_o2_v05.xml \
      --enableGun \
      --gun.distribution uniform \
      --gun.energy "10*GeV" \
      --gun.particle mu- \
      --numberOfEvents 100 \
      --outputFile Step1_edm4hep.root
k4run CLDReconstruction.py --inputFiles Step1_edm4hep.root

Stack trace: stack_trace_CLD_reco_COS7.txt

tmadlener commented 9 months ago

This looks oddly similar to what we see in the k4MarlinWrapper CI at the moment: https://github.com/key4hep/k4MarlinWrapper/actions/runs/7785263794/job/21264263282#step:3:6809

Do the Alma9 and CentOS7 nightlies have the same date at the moment, or is one of them lagging behind due to other issues? Can you easily check that?

BrieucF commented 9 months ago

They are both from today (2024-02-06)

tmadlener commented 9 months ago

OK. Thanks for checking. I will have a look at the k4MarlinWrapper one, and I think that should also solve this, since in the end also CLD reco is using that.

tmadlener commented 9 months ago

Quick update on the little things I have figured out so far using CLIC reco (but I would assume CLD reco is the same / similar). Playing around with commenting out a few algorithms, I think I have narrowed it down to LCFIPlus being the offender here. That would also make sense, as that uses some ROOT things internally and the segmentation violation is inside ROOT. I haven't been able to debug it much further and I also don't know why it only happens on CentOS7 only yet.

edit: As a side note: LCFIPlus hasn't changed in 2-ish years.

tmadlener commented 9 months ago

It also seems to have been present for at least three weeks now, or at least I can find CI runs that show the issue from 3 weeks ago, e.g. https://github.com/key4hep/k4MarlinWrapper/actions/runs/7519772256/job/20473078519

jmcarcell commented 9 months ago

Some hints by comparing to a working version, it doesn't work when:

In case it's LCFIPlus by checking the dates, it's not due to https://github.com/iLCSoft/LCFIVertex/pull/9 and the other dependencies remain unchanged

jmcarcell commented 5 months ago

Since there aren't going to be any more releases nor nightlies with CentOS 7 and it's unlikely anyone will have a look and fix it I'm going to close this.