legend-exp / legend-dataflow

LEGEND data flow management
Other
2 stars 9 forks source link

Segmentation fault when running `build_raw.py` #52

Open slwatkins opened 2 weeks ago

slwatkins commented 2 weeks ago

Testing data production, I ran into a segmentation fault that would occur in the build_raw.py script, such that the raw LH5 files would not be created in full. I tracked it down to happening within the build_raw function at the end of the script:

https://github.com/legend-exp/legend-dataflow/blob/a98f20783bfbd7208347dfccd6dcb5751667f70a/scripts/build_raw.py#L87

My "fix" was to change the line slightly to:

build_raw(args.input, out_spec=temp_output, **settings)

It appears that the original version attempts to load a large JSON file to out_spec and then passes the temporary output name as it is being built to filekey. Both the original version and my version should be supported (reading the docstrings), so it's not clear why the first was triggering a segfault, while my change wasn't.

EDIT: I see now that my change would not respect the channel numbers, so the original version should work... But I'll see if I can track down why the segmentation fault is happening.

slwatkins commented 1 week ago

It appears to be a versioning error. I reverted my software packages back to what was used for the data production, and I no longer hit the segfault.

Versions that I used before that would segfault:

"pkg_versions": {
        "pygama": "pygama==2.0.1",
        "pylegendmeta": "pylegendmeta==0.10.2",
        "dspeed": "dspeed==1.4.0a1",
        "legend-pydataobj": "legend-pydataobj==1.7.0",
        "legend-daq2lh5": "legend-daq2lh5==1.2.2"
}

Versions that build raw successfully without segfaulting:

"pkg_versions": {
        "pygama": "pygama==1.4.3",
        "pylegendmeta": "pylegendmeta==0.8.2",
        "dspeed": "dspeed==1.2.0",
        "legend-pydataobj": "legend-pydataobj==1.4.2",
        "legend-daq2lh5": "legend-daq2lh5==1.1.0"
 }

So perhaps there was a change downstream that now triggers this segfault? Would be nice if someone could recreate this error.

FYI, from the logs, the seg faults would happen directly after the DSP processing chain that creates what ever is in dsp_config, so perhaps something changed in how those generated parameters are saved between the old versions and the current versions of each package.