kevinkovalchik / RawTools

RawTools is an open-source and freely available package designed to perform scan data parsing and quantification, and quality control analysis of Thermo Orbitrap raw mass spectrometer files from data-dependent acquisition experiments.
Apache License 2.0
64 stars 19 forks source link

Limit number of CPUs #53

Closed sorenwacker closed 4 months ago

sorenwacker commented 4 years ago

Hi,

is it possible to limit the number of CPUs? We are running a queuing system for analysis, but when rawtools starts it uses all the cpus for each run. I am not sure if such a command line argument exists.

Cheers

kevinkovalchik commented 4 years ago

Hello,

Oops. It looks like I coded the parallel parts to let the user limit the number of threads, but apparently I never added this to the command line arguments. I'll see if I can get it fixed on Monday.

Kevin

On Fri, Jul 10, 2020 at 1:59 AM Sören notifications@github.com wrote:

Hi,

is it possible to limit the number of CPUs? We are running a queuing system for analysis, but when rawtools starts it uses all the cpus for each run. I am not sure if such a command line argument exists.

Cheers

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kevinkovalchik/RawTools/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PTUBJGI7CMBJ7OIBO5YDR22U2XANCNFSM4OWIPOZA .

kevinkovalchik commented 4 years ago

I added a new argument, -k, that sets the maximum simultaneous processes. Can you try it out before I release it? RawTools-v2.0.3beta.zip

Kevin

sorenwacker commented 4 years ago

Seems to work for -k 1. The process in not staying on a single CPU, but jumps around a bit. But that should be fine. Though when I use -k 2 more than two processes are started and essentially all CPUS are used.

The command I used was:

mono RawTools-v2.0.3beta/RawTools.exe -q -k 2 -r TMT11 -d /data/proteomics_storage
chrishuges commented 4 years ago

Maybe this should be specifying threads rather than CPUs (if this is not already the case)?

@soerendip what is your system configuration?

kevinkovalchik commented 4 years ago

What I have set is the maximum number of concurrent operations that can be run at once, not specific CPUs. So it is specifying threads. Here is the logic controlling this parameter:

public static ParallelOptions Options(int MaxThreads)
        {
            ParallelOptions options = new ParallelOptions();

            if (MaxThreads > Environment.ProcessorCount)
            {
                options.MaxDegreeOfParallelism = Environment.ProcessorCount;
            }
            else
            {
                options.MaxDegreeOfParallelism = MaxThreads;
            }

            return options;
        }

Which seems safe to me, so something else is going on. On my system I see a spike during the "Extracting scan indices" step which goes above the percentage I would expect. Is that when it is happening or does it go on for the whole process?

sorenwacker commented 4 years ago

I tested it on a notebook with 4 double cores. Intel i7.

kevinkovalchik commented 4 years ago

Hi again. Sorry for disappearing on this. I've been moving to a new house over the last week.

Would you want more than one process or is one fine? I can just set -k to limit the number to 1 since that seems to be working for you.

If you would like me to try to get variable cpus for -k working, the following might help:

  1. What OS is the notebook running?
  2. When you say "essentially all CPUS are used" do you mean at 100%?
  3. Are all the CPUs being used throughout the whole run, or is it only at a certain step? If a single step, what does the console output say it is doing then?
sorenwacker commented 4 years ago

Ideally, one would set the max number of parallel processes. But, for my purpose switching parallel on and off would be enough. I am using Linux (Ubuntu). Yes, even with -k 1 all cpus are used at some point. It is hard to say when that happens. Not sure about the percentage.