[BUG] --cpus flag doesn't get passed to docker

bjreisman commented 4 years ago

I'm trying to run PGAP on our local machine with 16 cores and 32 GB of ram, which for some reason comes out to 1.9 GB of ram per core. The MG37 dataset completed just fine, but I'm running into problems on my own genomes (~10MB). I though the issue might be a mismatch between RAM and cores which could be solved by requesting less cores, but when I set the --cpus option to 10 (for example), it still seem to be using all 16 cores.

Expected behavior I'd like to allocate a specific number of cores to the pgap docker container. It's possible (likely) there's another way to do this that I missed, but I thought the --cpus option would do the trick.

Software versions (please complete the following information):

OS: Ubuntu 20.04 LTS
pgap.py --version, or docker image version. 2020-03-30.build4489
Docker (or other container runner) version. [e.g. docker --version]: Docker version 19.03.9, build 9d988398e7

Log Files (first few lines of cwltool.log, happy to share the rest if needed)

Original command: pgap.py -r -o /mnt/5E5008A55008864D/Users/Nanopore/assemblies/xxx-flye/Unicycler-polish/ann-pgap-results /mnt/5E5008A55008864D/Users/Nanopore/assemblies/xxx-flye/Unicycler-polish/ann-pgap/pgap_input.yaml --verbose --cpus 10

Docker command: /usr/bin/docker run -i --rm --user 1001:1001 --volume /mnt/5E5008A55008864D/Users/Nanopore/pgap/input-2020-03-30.build4489:/pgap/input:ro,z --volume /mnt/5E5008A55008864D/Users/Nanopore/assemblies/xxx-flye/Unicycler-polish/ann-pgap:/pgap/user_input:z --volume /mnt/5E5008A55008864D/Users/Nanopore/assemblies/xxx-flye/Unicycler-polish/ann-pgap/pgap_input_1pqiu95d.yaml:/pgap/user_input/pgap_input.yaml:ro,z --volume /mnt/5E5008A55008864D/Users/Nanopore/assemblies/xxx-flye/Unicycler-polish/ann-pgap-results:/pgap/output:rw,z --cpus 10 ncbi/pgap:2020-03-30.build4489 cwltool --timestamps --disable-color --preserve-entire-environment --outdir /pgap/output pgap/pgap.cwl /pgap/user_input/pgap_input.yaml

--- Start YAML Input ---
fasta:
  class: File
  location: 019_final_polish.fasta
submol:
  class: File
  location: submol.yaml
supplemental_data: { class: Directory, location: /pgap/input }
report_usage: true
--- End YAML Input ---

--- Start Runtime Report ---
{
    "CPU cores": 16,
    "Docker image": "ncbi/pgap:2020-03-30.build4489",
    "cpu flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities",
    "cpu model": "Intel(R) Core(TM) i9-9900 CPU @ 3.10GHz",
    "max user processes": 127378,
    "memory (GiB)": 31.2,
    "memory per CPU core (GiB)": 1.9,
    "open files": 1024,
    "tmp disk space (GiB)": 61.8,
    "virtual memory": "unlimited",
    "work disk space (GiB)": 3070.2

Additional context Add any other context about the problem here.

azat-badretdin commented 4 years ago

Internal ticket PGAPX-786 opened, we will look at this soon.

azat-badretdin commented 4 years ago

), it still seem to be using all 16 cores.

FYI: if this is based on "CPU cores": 16, output line, it actually contains the real number of CPUs in the system, not the cpus requested. As your report shows, the actually executed docker command line contains --cpus 10 parameter.

Please let me know if this clears the problem.

bjreisman commented 4 years ago

Hmm... that appears to fix the CPU usage problem, but doesn't fix the larger problem which seems to be an IO bottleneck at the clsuter_blastp_wnode stage. It looks like there are still 16 instances of cluster_blastp_wnode attempting to run, I assumed that it was one per core. Is that not the case?

azat-badretdin commented 4 years ago

I assumed that it was one per core. Is that not the case?

Yes. It's supposed to work this way. I will check

azat-badretdin commented 4 years ago

Could you please post relevant parts of cwltool.log as before?

bjreisman commented 4 years ago

Certainly, it's running now, but I've attached it below: cwltool.log

whlavina commented 4 years ago

It appears that Docker's --cpus option does not alter what is presented via APIs such as /proc/cpuinfo. As a result, PGAP does run slower, getting throttled (by cgroups resource limits), but this does not reduce the number of threads and memory pressure as we expected. We'll have to investigate an appropriate fix.

azat-badretdin commented 4 years ago

Thanks, I see you are using --cpus 8

bjreisman commented 4 years ago

yup! I tried dropping from 12 to 8 to see if that would help. I've included a snapshot of the CPU stats from vmstat below:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp-----
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st                 CDT
 5  3   1027    290   7346  22466    0    0   155    76   40  141 31  3 56  9  0 2020-06-01 13:49:27

dsommer commented 4 years ago

pgap also ignores the --cpu flag when using singularity instead of docker. Using all available cpus is causing the workflow to run out of memory. Should this be a separate issue?

azat-badretdin commented 4 years ago

Should this be a separate issue?

Yes. Thanks for reporting!

whlavina commented 4 years ago

As status update: we are actively exploring a solution for this issue, but we are not yet ready to commit to a timeline for releasing a fix.

ybdong919 commented 4 years ago

I have the similar issue: --cpus flag does not work. Would you have any other way to control how many cpus are used?

marieleoz commented 3 years ago

Hello Azat,

This is Marie again :)

I think I may have the same issue but I am not sure (I'm not comfortable with Docker yet). After you helped me with issue #129 129, I tried to run without a --cpu flag and I got:

Original command: ./pgap.py -r -o TemS.CL/results TemS.CL/TemS_S96.generic.yaml

Docker command: /usr/bin/docker run -i --rm --user 1000:1000 --volume /home/adm-loc/Tools/pgap/input-2021-01-11.build5132:/pgap/input:ro,z --volume /home/adm-loc/Tools/pgap/TemS.CL:/pgap/user_input:z --volume /home/adm-loc/Tools/pgap/TemS.CL/pgap_input_b9bnz7y3.yaml:/pgap/user_input/pgap_input.yaml:ro,z --volume /tmp:/tmp:rw,z --volume /home/adm-loc/Tools/pgap/TemS.CL/results:/pgap/output:rw,z ncbi/pgap:2021-01-11.build5132 cwltool --timestamps --debug --disable-color --preserve-entire-environment --outdir /pgap/output pgap/pgap.cwl /pgap/user_input/pgap_input.yaml

STDOUT/STDERR: PGAP version 2021-01-11.build5132 is up to date. Output will be placed in: /home/adm-loc/Tools/pgap/TemS.CL/results WARNING: memory per CPU core (GiB) is less than the recommended value of 2 PGAP failed, docker exited with rc = 1 Unable to find error in log file.

Runtime Report from the cwltool.log: "CPU cores": 32, "Docker image": "ncbi/pgap:2021-01-11.build5132", "cpu flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d", "cpu model": "Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz", "max user processes": "unlimited", "memory (GiB)": 15.6, "memory per CPU core (GiB)": 0.5, "open files": 1048576, "tmp disk space (GiB)": 1235.0, "virtual memory": "unlimited", "work disk space (GiB)": 1235.0

I thus tried to only use 6 cores by running: ./pgap.py -r -o TemS.CL/results TemS.CL/TemS_S96.generic.yaml --cpus 6

But got the same STDOUT/STDERR, same Runtime report and (if I'm correct), same Docker command (which I paste just in case): Docker command: /usr/bin/docker run -i --rm --user 1000:1000 --volume /home/adm-loc/Tools/pgap/input-2021-01-11.build5132:/pgap/input:ro,z --volume /home/adm-loc/Tools/pgap/TemS.CL:/pgap/user_input:z --volume /home/adm-loc/Tools/pgap/TemS.CL/pgap_input__zxn00sh.yaml:/pgap/user_input/pgap_input.yaml:ro,z --volume /tmp:/tmp:rw,z --volume /home/adm-loc/Tools/pgap/TemS.CL/results:/pgap/output:rw,z ncbi/pgap:2021-01-11.build5132 /bin/taskset -c 0-5 cwltool --timestamps --debug --disable-color --preserve-entire-environment --outdir /pgap/output pgap/pgap.cwl /pgap/user_input/pgap_input.yaml

Could you confirm that I got the same issue and just have to wait for the next release? If so, any idea when this would be?

Thanks a lot!

Best, Marie

azat-badretdin commented 3 years ago

Could you confirm that I got the same issue and just have to wait for the next release?

No. Not without the actual log.

In our environment we are testing, for historic reasons on computers with 4Gb/core (that's AWS settings) since you have only 16Gb, could you please try --cpu 4?

PS. 16Gb is pretty low memory nowadays for Windows 10. Especially for running heavy computations.

azat-badretdin commented 3 years ago

Any of the sequences marked as plasmids in FASTA headers in your test case, Marie?

whlavina commented 3 years ago

Regarding confirmation that only 6 CPU are requested, note the /bin/taskset -c 0-5 in the logged command line. That restricts execution to CPUs 0 through 5 (6 CPUs total). There are other more detailed log files (in debug mode) which will also report the number of CPUs being used, which should further confirm the setting.

In contrast, the runtime report at the top shows how many are available in total (we should probably amend the warning message to be less confusing).

marieleoz commented 3 years ago

Thanks for your answers!

I will comment on each point:

I just tried --cpu 4, but it still fails (I attach the log) cwltool.log
I understand that 16Go isn't much but I successfully annotated a bunch of genomes locally before (though it took hours everytime indeed). I add that I didn't have to use this --cpu parameter then; would it help if I also attached a log from one of these?
I found no sequence marked as plasmid but I still attach the fasta in case you find something weird (renamed as .fasta.txt bc .fasta wouldn't be attached for some reason) TemS_S96.agp.fasta.txt
the runtime report is confusing indeed but now I understand that the flag does get passed to Docker. What's wrong then? Maybe I should open a new issue?

azat-badretdin commented 3 years ago

The cwltool.log says:

[2021-03-01 14:11:59] INFO Resolved 'pgap/pgap.cwl' to 'file:///pgap/pgap/pgap.cwl'
[2021-03-01 14:11:59] ERROR I'm sorry, I couldn't load this CWL file.

and later:

found duplicate key "report_usage" with value "True"

most likely your input YAML file contains report_usage: setting. If yes, please try to remove that setting and run it again.

marieleoz commented 3 years ago

Hey Azat,

I did have a: report_usage: true setting in the generic.yaml file.

I removed it and still got a: WARNING: memory per CPU core (GiB) is less than the recommended value of 2 But now it is running :)

Thanks for helping again!

Best, Marie

azat-badretdin commented 3 years ago

You are welcome, Marie!

ncbi / pgap

[BUG] --cpus flag doesn't get passed to docker #77