Problem with running the code in parallel

MohamedNedal commented 3 weeks ago

Hello, I was running the code as a Python script with lots of AIA images on a server, and it was occupying all the cores and making the server very slow. I've been told that the red part is how much time is spent in the kernel rather than in the Python code itself. Could it be too many parallel jobs? Is there a way to run the code more efficiently so that it doesn't slow down the server?

alasdairwilson commented 3 weeks ago

I assume you mean you are running parallel copies of the dem code, as in you have more than one set of AIA images and you are dispatching multiple jobs to make DEM maps from each set?

If that is not correct then let me know but assuming that is correct: demregpy is already parallelised, if you have enough pixels for it to be worth it then your data arrays are chunked into 100 pixel blocks and these are sent to cpu cores. It attempts to utilise all the system cores in this manner.

Since the occupancy is already very high when you add a bunch more jobs, they also try to dispatch these blocks to the threads that are already busy and this leads to far far slower behaviour than one job. This would manifest as a bunch of time spent on system tasks (which is seen here). This would be the same if you tried to parallelise a bunch of scipy or numpy matrix math which are already parallelised.

Since you have 64 cores and I chunk it in 100 pixel chunks then 64,000 pixels will be processed at any given time. If your images contain significantly more than this (e.g. 2-4x as many pixels as that) then you are probably already very efficient with more pixels being more efficient up to a limit. If they are smaller than that not then you might want to run multiple jobs at the same time.

To do so, you need to limit the number of threads available to demregpy manually via threadpoolctl. e.g. if you want to run 4 at a time then you should limit it to 16 threads (64/4 = 16). Otherwise, just leave it be with a serial list of observations to do one at a time.

MohamedNedal commented 2 weeks ago

Hi @alasdairwilson I was running one script only to do DEM analysis for around 600 frames, 12-second cadence AIA images in 6 channels within a 2-hour duration. Is there a way to make the code use fewer CPU cores fully rather than using all the cores partially? For example, are any of these two ways (or both together) correct?

```
import os
```

os.environ['OMP_NUM_THREADS'] = '16'

2.
```python
from threadpoolctl import threadpool_limits

with threadpool_limits(limits=16):
    # DEM analysis code here
    ...
    pass

ianan / demreg

Problem with running the code in parallel #29