Closed o-smirnov closed 2 years ago
ATLAS instead?
On Tue, 3 Aug 2021, 22:03 Oleg Smirnov, @.***> wrote:
BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Bad memory unallocation! : 128 0x7f252fe6d000 BLAS : Bad memory unallocation! : 128 0x7f24a4dd1000 BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Program is Terminated. Because you tried to allocate too many memory regions. BLAS : Bad memory unallocation! : 128 0x7f32fa3e6000 BLAS : Bad memory unallocation! : 128 0x7f2ae0750000 Segmentation fault (core dumped)
BLAS is maths library not English library, clearly. Because you shouldn't start a sentence with a preposition. Admittedly, is lesser sin than gluttony! Because trying to allocate too many memory regions is.
This was on a 32-antenna, 1k channel MeerKAT MS with 128 dask threads. I reduced the threads to 64, and now it runs (in a steady and modest ~80G memory, so I find it odd that having twice the threads caused this gluttony.)
input_ms: path: ../msdir/1627405250_sdp_l0-J2009_2026-corr.ms data_column: DATA weight_column: WEIGHT_SPECTRUM time_chunk: '128s' freq_chunk: '1GHz' select_fields: [] select_ddids: []input_model: recipe: MODEL_DATA:DIR1_DATA apply_p_jones: truesolver: terms: [G,dE] iter_recipe: [25,25,25,25,25]output: gain_dir: gains.qc products: [corrected_data, corrected_residual] columns: [CORR_DATA, RES_DATA] net_gain: truemad_flags: enable: false threshold_bl: 10 threshold_global: 12dask: threads: 64 scheduler: distributedG: type: delay direction_dependent: false time_interval: '8s' freq_interval: '1GHz'
load_from:
interp_mode: reim
interp_method: 2dlineardE:
type: complex direction_dependent: true time_interval: '128s' freq_interval: '50MHz'
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JSKenyon/QuartiCal/issues/97, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6RQS55UC2Z2T2JT3STT3BDPPANCNFSM5BPQSVZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
This is a tricky one, and it is definitely related to the number of threads in use. These are probably relevant: https://github.com/xianyi/OpenBLAS/issues/1882 and https://stackoverflow.com/questions/45086246/too-many-memory-regions-error-with-dask.
@o-smirnov Can you please try doing export MKL_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 OMP_NUM_THREADS=1
and rerunning with 128 threads? If that runs through then the culprit is nested parallelism in some of the numpy/numba routines. I will get round to disabling this from the get-go at some point as it is detrimental to performance.
Note that the nested parallelism that QuartiCal itself uses shouldn't have this problem. This is specifically every dask thread trying to use 128 threads when invoking parallel numpy-like functions (I suspect that in your case the culprit is the np.linalg.solve
in the delay solver).
Closing for now - there is no longer an np.linalg.solve
call. Please reopen if you encounter this issue again.
BLAS is maths library not English library, clearly. Because you shouldn't start a sentence with a preposition. Admittedly, is lesser sin than gluttony! Because trying to allocate too many memory regions is.
This was on a 32-antenna, 1k channel MeerKAT MS with 128 dask threads. I reduced the threads to 64, and now it runs (in a steady and modest ~80G memory, so I find it odd that having twice the threads caused this gluttony.)