Poor CPU Utilisation under Linux

trichter / qopen

Qopen: Separation of intrinsic and scattering Q by envelope inversion

MIT License

29 stars 13 forks source link

Poor CPU Utilisation under Linux #7

Open gmfricke opened 1 week ago

gmfricke commented 1 week ago

Hello,

I am supporting a user @andresp-wave at our HPC center who is using qopen. He is allocating 32 CPUs through Slurm and setting the the njobs in qopen config file to either null or to 32. On his Mac he sees basically %100 CPU usage of the 28 cores. On our Rocky Linux systems he sees only 5% CPU usage. The CPUs are used at 100% in bursts of 2-3 seconds and then are idle except for one CPU for 10 or more seconds.

Can you provide any insight into this CPU usage pattern? All the best,

Matthew

gmfricke commented 1 week ago

When one CPU and job are chosen CPU utilisation is 60%.

trichter commented 1 week ago

Hi!

Which Qopen command are you running, go or fixed?

Parallel processing is only applied to the inversion in different frequency bands. Therefore, the number of cores which can be used is limited by the number of frequency bands plus the main thread. Among others, response removal and plotting is handled by the main thread alone. I have noticed similar behavior on my PC, though not as extreme as what you have observed on your HPC.

As a work-around, the time spent solely in the main thread could be reduced by turning of unnecessary plots in the configuration file.

An alternative parallelization scheme could be implemented that distributes the processing across different events. I guess this would solve the issue completely.

andresp-wave commented 1 week ago

Hi Tom,

I am running qopen source. This step takes around ~ 3 days for ~35,000 events in my desktop (all plots off). So I am checking how to speed up the process using the HPC.

I will implement what you mentioned in your last sentence, Thanks for your reply.

Best, Andres.

trichter commented 1 week ago

I see, for qopen source, the current parallelization scheme is not a big help, because not much is done in parallel except a single fit. If you have broad-band data and use full response removal, you could switch to only remove the sensitivity.

If you want to implement the parallelization by event, you could just load stations etc in your own script and call the invert function in parallel for each event.

Good luck!