ratt-ru / CubiCal

A fast radio interferometric calibration suite.
GNU General Public License v2.0
18 stars 13 forks source link

fixes-459 #460

Closed bennahugo closed 2 years ago

bennahugo commented 2 years ago

This fixes #459.

As near as I can tell the issue is in the use of threads inside numba..... TDD fails to import in the new numba (even if I follow their instructions and install from pip.... @JSKenyon)

I still can't track down this super annoying error -- that part of the code is a bit cryptic maybe @o-smirnov can be of some assistance here

INFO      19:19:39 - data_handler       [x01] [0.3/2.2 2.6/23.0 0.9Gb] reading BITFLAG
INFO      19:19:39 - main               [0.2/2.9 2.1/25.8 0.9Gb] WARNING: unrecognized worker process name 'Process-6'. Please inform the developers.

At least it runs through again now

Running with

gocubical --sol-jones g,dd --data-ms msdir/1491291289.1ghz.1.1ghz.4hrs.ms --data-column CORRECTED_DATA --data-time-chunk 8 --data-freq-chunk 0 --model-list "MODEL_DATA+-output/deep2.DicoModel@msdir/tag.reg:output/deep2.DicoModel@msdir/tag.reg" --model-ddes auto --weight-column WEIGHT --flags-apply FLAG --flags-auto-init legacy --madmax-enable 0 --madmax-global-threshold 0,0 --madmax-threshold 0,0,10 --sol-stall-quorum 0.95 --sol-term-iters 50,90,50,90 --sol-min-bl 110.0 --sol-max-bl 0 --dist-max-chunks 4 --out-name output/deep2cal --out-overwrite 1 --out-mode sr --out-column DE_DATA --out-subtract-dirs 1:  --g-time-int 8 --g-freq-int 0 --g-clip-low 0 --g-clip-high 0 --g-type complex-diag --g-update-type phase-diag --g-max-prior-error 0.35 --g-max-post-error 0.35 --g-max-iter 100 --dd-dd-term 1 --dd-time-int 8 --dd-freq-int 32 --dd-clip-low 0 --dd-clip-high 0 --dd-type complex-diag --dd-fix-dirs 0 --dd-max-prior-error 0.35 --dd-max-post-error 0.35 --dd-max-iter 200 --degridding-OverS 11 --degridding-Support 7 --degridding-Nw 100 --degridding-wmax 0 --degridding-Padding 1.7 --degridding-NDegridBand 15 --degridding-MaxFacetSize 0.15 --degridding-MinNFacetPerAxis 1 --dist-nthread 4 --dist-nworker 4 --dist-ncpu 4
bennahugo commented 2 years ago

Yup. As discussed -:- https://numba.pydata.org/numba-doc/latest/user/threading-layer.html This says TBB can be enabled by installing the python package from pip. This is not my experience though -- it seems to be picking up the system version. This is therefore only a workaround.

This used to work with earlier versions of numba though so I'm not sure what has changed to make things execute "unsafely". Perhaps they switched over their default threaded model.

JSKenyon commented 2 years ago

Just putting my two cents here. Note these issues on Numba: https://github.com/numba/numba/issues/6108 and https://github.com/numba/numba/issues/7148. It seems that it has something to do with discovery of the .so files. It is possible to work around this by setting LD_LIBRARY_PATH to wherever pip put the file. In my case this was something like path/to/venv/lib.

bennahugo commented 2 years ago

Ok I tried exporting the LD_LIBRARY_PATH (technically it should not be needed when the virtualenv is activated expressly though)

Numba -s reports successful import

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

I do have

Requirement already satisfied: tbb in ./venvddf/lib/python3.6/site-packages (2021.5.1)

However, when I try to execute the basic function

    def set_numba_threading(nthread):
        try:
            numba.config.THREADING_LAYER = "safe"
            @numba.njit(parallel=True)
            def foo(a, b):
                return a + b
            foo(np.arange(5), np.arange(5))
            return nthread
        except:
            numba.config.THREADING_LAYER = "default"
            print("Cannot use TBB threading (check your installation). Dropping the number of solver threads to 1", file=log(0, "red"))
            return 1

I get

/home/hugo/workspace/venvddf/lib/python3.6/site-packages/numba/np/ufunc/parallel.py:365: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
  warnings.warn(problem)
INFO      10:59:42 - main               [0.2 2.0 1.0Gb] Cannot use TBB threading (check your installation). Dropping the number of solver threads to 1

So it is still picking up the older system libraries version. Unfortunately I can't uninstall that version --- it will break several system packages

bennahugo commented 2 years ago

Ok hang on... found another place where OMP is being used -- the degridder.... J

It may run into a screwup when forks and threads are used. I suggest we switch to workqueue if TBB fails to load then on top of that set the environment variables accordingly -- if workers.py sets nthread >1 then the degridder needs to go to OMP_NUM_THREADS == 1

bennahugo commented 2 years ago

I'm not sure how this did not give issues before. My best guess is numpy now invokes OMP before we fork and then it becomes unsafe to use.....

bennahugo commented 2 years ago

Nope workqueue don't completely solve the issue -- things still go pot if threads > 1 is used on workqueue -- and it is not a memory issue. I'm testing this on com08

Edit : at least on 18.04 it looks like one would need to compile packages at system level (possibly just to include headers for TBB ???) I have no idea how to get things working in a venv. I've traced it down to stem from the numba code with these changes

bennahugo commented 2 years ago

Alright I'm happy this now works as advertised -- can't believe nobody picked up the issue with the previously used small angle approximation for getting the ra dec of the facet center for beam application: Predicted flux far off axis E evaluated almost at the source :

[278:293,1833:1850] min -0.01585, max 0.05009, mean 0.002003, std 0.01133, sum 0.5107, np 255

original convolved model flux (subject to a slightly different beam evaluation due to the regular facets used in DDF - so a small difference to be expected

[270:297,1831:1857] min -1.119e-09, max 0.04783, mean 0.00147, std 0.005706, sum 1.032, np 702

Apparent peak convolved flux of the source:

[270:297,1832:1856] min -0.0004496, max 0.02002, mean 0.0006666, std 0.002545, sum 0.4319, np 648

@viralp please take note -- you have previously tried using the beam within cubical and not getting decent subtraction when peeling sources from intrinsic models

The example use is (pointing center may be set to be the phase center if you did not mosaic via 'DataPhaseDir':

gocubical --sol-jones g,dd --data-ms msdir/1491291289.1ghz.1.1ghz.4hrs.ms --data-column CORRECTED_DATA --data-time-chunk 8 --data-freq-chunk 0 --model-list "output/deep2.DicoModel@msdir/tag2.reg" --model-ddes auto --weight-column WEIGHT --flags-apply FLAG --flags-auto-init legacy --madmax-enable 0 --madmax-global-threshold 0,0 --madmax-threshold 0,0,10 --sol-stall-quorum 0.95 --sol-term-iters 50 --sol-min-bl 110.0 --sol-max-bl 0 --dist-max-chunks 4 --out-name output/deep2cal --out-overwrite 1 --out-mode sr --out-column DE_DATA --out-subtract-dirs 0  --g-time-int 8 --g-freq-int 0 --g-clip-low 0 --g-clip-high 0 --g-type complex-diag --g-update-type phase-diag --g-max-prior-error 0.35 --g-max-post-error 0.35 --g-max-iter 100 --degridding-OverS 11 --degridding-Support 7 --degridding-Nw 100 --degridding-wmax 0 --degridding-Padding 1.7 --degridding-NDegridBand 15 --degridding-MaxFacetSize 0.15 --degridding-MinNFacetPerAxis 1 --dist-nthread 1 --dist-nworker 16 --dist-ncpu 4 --degridding-NProcess 8 --degridding-BeamModel FITS --degridding-FITSFile 'input/meerkat_pb_jones_cube_95channels_$(corr)_$(reim).fits' --out-model-column MODEL_OUT --sel-field 2 --degridding-PointingCenterAt j2000,4h13m26.40,-80d00m00s

This work is done in preparation for SKA-MID. I will next port heterogeneous beams to this package

bennahugo commented 2 years ago

@JSKenyon please review