Closed bennahugo closed 2 years ago
Yup. As discussed -:- https://numba.pydata.org/numba-doc/latest/user/threading-layer.html This says TBB can be enabled by installing the python package from pip. This is not my experience though -- it seems to be picking up the system version. This is therefore only a workaround.
This used to work with earlier versions of numba though so I'm not sure what has changed to make things execute "unsafely". Perhaps they switched over their default threaded model.
Just putting my two cents here. Note these issues on Numba: https://github.com/numba/numba/issues/6108 and https://github.com/numba/numba/issues/7148. It seems that it has something to do with discovery of the .so
files. It is possible to work around this by setting LD_LIBRARY_PATH
to wherever pip
put the file. In my case this was something like path/to/venv/lib
.
Ok I tried exporting the LD_LIBRARY_PATH (technically it should not be needed when the virtualenv is activated expressly though)
Numba -s reports successful import
__Threading Layer Information__
TBB Threading Layer Available : True
+-->TBB imported successfully.
OpenMP Threading Layer Available : True
+-->Vendor: GNU
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
I do have
Requirement already satisfied: tbb in ./venvddf/lib/python3.6/site-packages (2021.5.1)
However, when I try to execute the basic function
def set_numba_threading(nthread):
try:
numba.config.THREADING_LAYER = "safe"
@numba.njit(parallel=True)
def foo(a, b):
return a + b
foo(np.arange(5), np.arange(5))
return nthread
except:
numba.config.THREADING_LAYER = "default"
print("Cannot use TBB threading (check your installation). Dropping the number of solver threads to 1", file=log(0, "red"))
return 1
I get
/home/hugo/workspace/venvddf/lib/python3.6/site-packages/numba/np/ufunc/parallel.py:365: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
INFO 10:59:42 - main [0.2 2.0 1.0Gb] Cannot use TBB threading (check your installation). Dropping the number of solver threads to 1
So it is still picking up the older system libraries version. Unfortunately I can't uninstall that version --- it will break several system packages
Ok hang on... found another place where OMP is being used -- the degridder.... J
It may run into a screwup when forks and threads are used. I suggest we switch to workqueue if TBB fails to load then on top of that set the environment variables accordingly -- if workers.py sets nthread >1 then the degridder needs to go to OMP_NUM_THREADS == 1
I'm not sure how this did not give issues before. My best guess is numpy now invokes OMP before we fork and then it becomes unsafe to use.....
Nope workqueue don't completely solve the issue -- things still go pot if threads > 1 is used on workqueue -- and it is not a memory issue. I'm testing this on com08
Edit : at least on 18.04 it looks like one would need to compile packages at system level (possibly just to include headers for TBB ???) I have no idea how to get things working in a venv. I've traced it down to stem from the numba code with these changes
Alright I'm happy this now works as advertised -- can't believe nobody picked up the issue with the previously used small angle approximation for getting the ra dec of the facet center for beam application: Predicted flux far off axis E evaluated almost at the source :
[278:293,1833:1850] min -0.01585, max 0.05009, mean 0.002003, std 0.01133, sum 0.5107, np 255
original convolved model flux (subject to a slightly different beam evaluation due to the regular facets used in DDF - so a small difference to be expected
[270:297,1831:1857] min -1.119e-09, max 0.04783, mean 0.00147, std 0.005706, sum 1.032, np 702
Apparent peak convolved flux of the source:
[270:297,1832:1856] min -0.0004496, max 0.02002, mean 0.0006666, std 0.002545, sum 0.4319, np 648
@viralp please take note -- you have previously tried using the beam within cubical and not getting decent subtraction when peeling sources from intrinsic models
The example use is (pointing center may be set to be the phase center if you did not mosaic via 'DataPhaseDir':
gocubical --sol-jones g,dd --data-ms msdir/1491291289.1ghz.1.1ghz.4hrs.ms --data-column CORRECTED_DATA --data-time-chunk 8 --data-freq-chunk 0 --model-list "output/deep2.DicoModel@msdir/tag2.reg" --model-ddes auto --weight-column WEIGHT --flags-apply FLAG --flags-auto-init legacy --madmax-enable 0 --madmax-global-threshold 0,0 --madmax-threshold 0,0,10 --sol-stall-quorum 0.95 --sol-term-iters 50 --sol-min-bl 110.0 --sol-max-bl 0 --dist-max-chunks 4 --out-name output/deep2cal --out-overwrite 1 --out-mode sr --out-column DE_DATA --out-subtract-dirs 0 --g-time-int 8 --g-freq-int 0 --g-clip-low 0 --g-clip-high 0 --g-type complex-diag --g-update-type phase-diag --g-max-prior-error 0.35 --g-max-post-error 0.35 --g-max-iter 100 --degridding-OverS 11 --degridding-Support 7 --degridding-Nw 100 --degridding-wmax 0 --degridding-Padding 1.7 --degridding-NDegridBand 15 --degridding-MaxFacetSize 0.15 --degridding-MinNFacetPerAxis 1 --dist-nthread 1 --dist-nworker 16 --dist-ncpu 4 --degridding-NProcess 8 --degridding-BeamModel FITS --degridding-FITSFile 'input/meerkat_pb_jones_cube_95channels_$(corr)_$(reim).fits' --out-model-column MODEL_OUT --sel-field 2 --degridding-PointingCenterAt j2000,4h13m26.40,-80d00m00s
This work is done in preparation for SKA-MID. I will next port heterogeneous beams to this package
@JSKenyon please review
This fixes #459.
As near as I can tell the issue is in the use of threads inside numba..... TDD fails to import in the new numba (even if I follow their instructions and install from pip.... @JSKenyon)
I still can't track down this super annoying error -- that part of the code is a bit cryptic maybe @o-smirnov can be of some assistance here
At least it runs through again now
Running with