Closed hdante closed 2 weeks ago
I did not succeed in reproducing the stack above (can you show us how you enabled sanitizer ?) but clearly a bug has been spotted. See #190
Hello, to enable the sanitizer:
CXXFLAGS="$CXXFLAGS -fsanitize=address
, LDFLAGS="$LDFLAGS -fsanitize=address
LD_PRELOAD=libasan.so
Thanks for the patch.
ok and what did you run exactly?
I ran a RAIL estimation script (rail-estimate
from https://github.com/linea-it/pz-compute), dispatched in the LineA cluster by slurm. Here is the full log:
(base) [henrique.almeida@loginapl01 testlephare]$ cat slurm-26171.out
2024-08-23 19:11:22.303955: Starting: slurm-dispatch
Number of slots: 1
2024-08-23 19:11:22.304359: Starting: finding program paths
/usr/bin/srun
2024-08-23 19:11:22.305345: Finished: finding program paths
FIXME: setting LD_PRELOAD=libasan.so
2024-08-23 19:11:22.305435: Starting: /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-estimate -a lephare data/objectTable_tract_5065_DC2_2_2i_runs_DP0_2_v23_0_1_PREOPS-905_step3_29_20220314T204515Z-part35.hdf5 output5.hdf5 id=0
Estimator algorithm: lephare
Estimator calibration file: estimator_lephare.pkl
Bins: 301
HDF5 group name: ""
Column template for magnitude data: "mag_{band}"
Column template for error data: "magerr_{band}"
Extra parameter file: "None"
Starting setup.
Loading all program modules...
Configuring estimator...
Loading input file...
Setup done.
Starting estimate.
#######################################
# PHOTOMETRIC REDSHIFT with OPTIONS #
# Config file :
# CAT_IN : bidon
# CAT_OUT : zphot.out
# CAT_LINES : 0 1000000000
# PARA_OUT : /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare-data/examples/output.para
# INP_TYPE : M
# CAT_FMT[0:MEME 1:MMEE] : 0
# CAT_MAG : AB
# ZPHOTLIB : LSST_STAR_MAG LSST_GAL_MAG LSST_QSO_MAG
# FIR_LIB :
# FIR_LMIN : 7.000000
# FIR_CONT : -1.000000
# FIR_SCALE : -1.000000
# FIR_FREESCALE : YES
# FIR_SUBSTELLAR : NO
# ERR_SCALE : 0.020000 0.020000 0.020000 0.020000 0.020000 0.020000
# ERR_FACTOR : 1.500000
# GLB_CONTEXT : 63
# FORB_CONTEXT : -1
# DZ_WIN : 1.000000
# MIN_THRES : 0.020000
# MAG_ABS : -24.000000 -5.000000
# MAG_ABS_AGN : -30.000000 -10.000000
# MAG_REF : 2
# NZ_PRIOR : -2 -2
# Z_INTERP : YES
# Z_METHOD : BEST
# PROB_INTZ : 0.000000
# MABS_METHOD : 1
# MABS_CONTEXT : 63
# MABS_REF : 0
# AUTO_ADAPT : NO
# ADAPT_BAND : 4
# ADAPT_LIM : 1.500000 23.000000
# ADAPT_ZBIN : 0.010000 6.000000
# ZFIX : NO
# SPEC_OUT : NO
# CHI_OUT : NO
# PDZ_OUT : test
#######################################
Reading input librairies ...
Read lib
Number of keywords to be read in the doc: 14
Number of keywords read at the command line (excluding -c config): 0
Reading keywords from /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/var/cache/lephare/runs/train/lib_mag/LSST_QSO_MAG.doc
Number of keywords read in the config file: 17
Keyword NUMBER_ROWS not provided
Keyword NUMBER_SED not provided
Keyword Z_FORM not provided
Reading library: /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/var/cache/lephare/runs/train/lib_mag/LSST_QSO_MAG.bin
Done with the library reading with 16856 SED read.
Number of keywords to be read in the doc: 14
Number of keywords read at the command line (excluding -c config): 0
Reading keywords from /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/var/cache/lephare/runs/train/lib_mag/LSST_GAL_MAG.doc
Number of keywords read in the config file: 17
Keyword NUMBER_ROWS not provided
Keyword NUMBER_SED not provided
Keyword Z_FORM not provided
Reading library: /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/var/cache/lephare/runs/train/lib_mag/LSST_GAL_MAG.bin
Done with the library reading with 429226 SED read.
Number of keywords to be read in the doc: 14
Number of keywords read at the command line (excluding -c config): 0
Reading keywords from /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/var/cache/lephare/runs/train/lib_mag/LSST_STAR_MAG.doc
Number of keywords read in the config file: 17
Keyword NUMBER_ROWS not provided
Keyword NUMBER_SED not provided
Keyword Z_FORM not provided
Reading library: /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/var/cache/lephare/runs/train/lib_mag/LSST_STAR_MAG.bin
Done with the library reading with 429480 SED read.
Read lib out
Read filt
# NAME IDENT Lbda_mean Lbeff(Vega) FWHM AB-cor VEGA CALIB Fac_corr
total_u.pb 1 0.3664 0.3720 0.0457 0.7023 -20.9200 0 1.0000
total_g.pb 2 0.4842 0.4744 0.1395 -0.0845 -20.7200 0 1.0000
total_r.pb 3 0.6251 0.6162 0.1340 0.1519 -21.5300 0 1.0000
total_i.pb 4 0.7562 0.7497 0.1297 0.3697 -22.1600 0 1.0000
total_z.pb 5 0.8693 0.8670 0.1010 0.5172 -22.6200 0 1.0000
total_y3.pb 6 1.0080 1.0050 0.0577 0.5864 -23.0100 0 1.0000
=================================================================
==13500==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7ffde392ba88 at pc 0x2b5c4dc64a76 bp 0x7ffde392abc0 sp 0x7ffde392abb8
READ of size 8 at 0x7ffde392ba88 thread T0
#0 0x2b5c4dc64a75 in onesource::generatePDF(std::vector<SED*, std::allocator<SED*> >&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<int, std::allocator<int> >, int, bool) /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/src/lib/onesource.cpp:1050
#1 0x2b5c4dd0d779 in PhotoZ::run_photoz(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&) /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/src/lib/photoz_lib.cpp:1608
#2 0x2b5c4dfb75a0 in pybind11::cpp_function::cpp_function<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}::operator()(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&) const /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/extern/pybind11/include/pybind11/pybind11.h:154
#3 0x2b5c4dfb75a0 in void pybind11::detail::argument_loader<PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&>::call_impl<void, pybind11::cpp_function::cpp_function<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}&, 0ul, 1ul, 2ul, 3ul, pybind11::detail::void_type>(pybind11::cpp_function::cpp_function<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul>, pybind11::detail::void_type&&) && /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/extern/pybind11/include/pybind11/cast.h:1506
#4 0x2b5c4dfb75a0 in std::enable_if<std::is_void<void>::value, pybind11::detail::void_type>::type pybind11::detail::argument_loader<PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&>::call<void, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}&>(pybind11::cpp_function::cpp_function<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}&) && /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/extern/pybind11/include/pybind11/cast.h:1480
#5 0x2b5c4dfb75a0 in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}, void, PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}&&, void (*)(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/extern/pybind11/include/pybind11/pybind11.h:297
#6 0x2b5c4dfb75a0 in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}, void, PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<void, PhotoZ, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (PhotoZ::*)(std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&)#1}&&, void (*)(PhotoZ*, std::vector<onesource*, std::allocator<onesource*> >, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/extern/pybind11/include/pybind11/pybind11.h:267
#7 0x2b5c4debe5d5 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/extern/pybind11/include/pybind11/pybind11.h:989
#8 0x528766 in cfunction_call /usr/local/src/conda/python-3.11.9/Objects/methodobject.c:542
#9 0x5041ab in _PyObject_MakeTpCall /usr/local/src/conda/python-3.11.9/Objects/call.c:214
#10 0x5116e6 in _PyEval_EvalFrameDefault /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769
#11 0x5cbed9 in _PyEval_EvalFrame /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#12 0x5cbed9 in _PyEval_Vector /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#13 0x5cb5ae in PyEval_EvalCode /usr/local/src/conda/python-3.11.9/Python/ceval.c:1148
#14 0x5ec6a6 in run_eval_code_obj /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:1741
#15 0x5e823f in run_mod /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:1762
#16 0x5fd191 in pyrun_file /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:1657
#17 0x5fc55e in _PyRun_SimpleFileObject /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:440
#18 0x5fc282 in _PyRun_AnyFileObject /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:79
#19 0x5f6efd in pymain_run_file_obj /usr/local/src/conda/python-3.11.9/Modules/main.c:360
#20 0x5f6efd in pymain_run_file /usr/local/src/conda/python-3.11.9/Modules/main.c:379
#21 0x5f6efd in pymain_run_python /usr/local/src/conda/python-3.11.9/Modules/main.c:601
#22 0x5f6efd in Py_RunMain /usr/local/src/conda/python-3.11.9/Modules/main.c:680
#23 0x5bbc78 in Py_BytesMain /usr/local/src/conda/python-3.11.9/Modules/main.c:734
#24 0x2b5b9b8d0554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#25 0x5bbac2 (/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/bin/python3.11+0x5bbac2)
Address 0x7ffde392ba88 is located in stack of thread T0
SUMMARY: AddressSanitizer: dynamic-stack-buffer-overflow /lustre/t1/cl/lsst/tmp/henrique.almeida/lephare/src/lib/onesource.cpp:1050 in onesource::generatePDF(std::vector<SED*, std::allocator<SED*> >&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<int, std::allocator<int> >, int, bool)
Shadow bytes around the buggy address:
0x10003c71d700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d720: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d740: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10003c71d750: 00[cb]cb cb cb cb cb cb 00 00 00 00 ca ca ca ca
0x10003c71d760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003c71d7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==13500==ABORTING
srun: error: apl01: task 0: Exited with exit code 1
=================================================================
=================================================================
==13486==ERROR: LeakSanitizer: detected memory leaks
==13487==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 342 byte(s) in 7 object(s) allocated from:
Direct leak of 86 byte(s) in 3 object(s) allocated from:
#0 0x2ac68a0c87c7 in __interceptor_calloc /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#0 0x2ac68a0c87c7 in __interceptor_calloc /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x2ac68af648e2 in slurm_xmalloc /home/abuild/rpmbuild/BUILD/slurm-18.08.8/src/common/xmalloc.c:87
#1 0x2ac68af648e2 in slurm_xmalloc /home/abuild/rpmbuild/BUILD/slurm-18.08.8/src/common/xmalloc.c:87
SUMMARY: AddressSanitizer: 342 byte(s) leaked in 7 allocation(s).
SUMMARY: AddressSanitizer: 86 byte(s) leaked in 3 allocation(s).
Traceback (most recent call last):
File "/var/spool/slurm/d/job26171/slurm_script", line 113, in <module>
if __name__ == '__main__': main()
^^^^^^
File "/var/spool/slurm/d/job26171/slurm_script", line 111, in main
parallel(tasks, slots, args)
File "/var/spool/slurm/d/job26171/slurm_script", line 86, in parallel
raise RuntimeError('Error: child process returned failure (%d).'
RuntimeError: Error: child process returned failure (256).
Hello, I'm currently trying to track a memory leak when executing estimations and, when compiling the lephare C++ library with the address sanitizer active, I'm receiving the following error:
The following line is mentioned at the top of the stack frame:
However,
size(PDFcol2loc) == pdfmap[7].size()
, see line 870:lephare version: 0.1.11.dev4+gff46ff8