Open ksahlin opened 2 years ago
Thanks for the heads up. What does /bin/time report for CPU usage? The segfault is most likely a bug, happy to take a look if you can post data which reproduces it, please try to reduce the data size if you do that e.g. try repeatedly halving the input size and see if the first or second half causes the segfault.
For the jobs that had 10%-90% CPU usage (based on /usr/bin/time -v), I was setting -threads 1
. Unfortunately, I overwrote the logfiles of those jobs with my latest attempt where I omitted to specify the -threads
parameter. For the new jobs where I didn't specify the -threads
parameter, I observed a CPU usage of between 92-97%. I have below pasted an example job. URMAP is given a node with 20 cores and 128Gb memory.
I also now noticed the warning WARNING: Max OMP threads 1
below. Probably highly relevant?
urmap v1.0.1480_i86linux64, 131Gb RAM, 20 cores
(C) Copyright 2019 Robert C. Edgar
https://drive5.com/urmap
00:20 30.1Gb 100% Reading index index.ufi
00:20 30.1Gb CPU has 20 cores, defaulting to 10 threads
WARNING: Max OMP threads 1
36:16 30.1Gb 100% Mapping (paired) reads_L.fq
20 Seconds to load index
35.9 Minutes in mapper
20,000,000 Reads (20.0M)
9276.4 Reads/sec. (1 threads)
19,328,546 Mapped Q>=10 (96.6%)
592,350 Mapped Q< 10 (3.0%)
79,104 Unmapped (0.4%)
36:16 30.1Gb Exiting
Command being timed: "urmap -map2 /scratch/23690412/reads_L.fq -reverse /scratch/23690412/reads_R.fq -ufi /scratch/23690412/index.ufi -samout /scratch/23690412/aligned.sam"
User time (seconds): 1849.33
System time (seconds): 280.65
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 36:19.22
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 29427296
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3
Minor (reclaiming a frame) page faults: 326673019
Voluntary context switches: 25204
Involuntary context switches: 5843
Swaps: 0
File system inputs: 6449864
File system outputs: 21951392
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
As for the segfault, I ran URMAP on the first 100 paired reads of an instance where it segfaulted. Output and reads pasted below.
Command
urmap -map2 ten_reads_L.fq -reverse ten_reads_R.fq -ufi Homo_sapiens.GRCh38.dna.fa.ufi -samout aligned.sam
Output
urmap v1.0.1480_i86linux64, 131Gb RAM, 20 cores
(C) Copyright 2019 Robert C. Edgar
https://drive5.com/urmap
00:19 30.1Gb 100% Reading index Homo_sapiens.GRCh38.dna.primary_assembly_modified_headers.fa.ufi
00:19 30.1Gb CPU has 20 cores, defaulting to 10 threads
00:19 30.1Gb 0.00% Mapping (paired) test_reads_L.fq/var/spool/slurmd/job23716540/slurm_script: line 15: 9693 Segmentation fault (core dumped) urmap -map2 test_reads_L.fq -reverse tenst_reads_R.fq -ufi /proj/snic2020-16-138/ultra_eval/genomes/Homo_sapiens.GRCh38.dna.primary_assembly_modified_headers.fa.ufi -samout aligned.sam
"noticed the warning WARNING: Max OMP threads 1 below. Probably highly relevant?" Yes :-) This means that there is a maximum of 1 thread imposed by your OMP configuration, usually this happens because you complied without linking to the OMP library. Second possibility is that the OMP_NUM_THREADS environment variable is set to 1, but this would be a strange default setting. You can try setting OMP_NUM_THREADS to 99 or something like that to see if it fixes the warning. This will probably make a huge difference to your benchmark so I'd appreciate it if you posed an update / correction once you figure this out. Thanks for the zip file, I'll take a look.
Figured so :) Thanks!
I never compiled URMAP. I downloaded the precompiled Linux binary urmap
available here. Do you think the binary needs an update?
I don't think the OMP_NUM_THREADS environment variable is set to 1, although I'm not sure how I can verify that(?). The reason that I don't think so is that I can compile my own aligner that uses OMP with the flag -fopenmp
in the same environment and multithreading works well.
I will try installing from source instead, perhaps tomorrow. and I will definitely rerun and update the plot once I update the installation.
The binary https://github.com/rcedgar/urmap/raw/master/bin/urmap supports OMP, I'm running test_reads.zip with it right now, so far it has worked several times without segfaulting. Did you build the ufi on the same machine that you ran the mapping test? You can check OMP environment variables by running:
env | grep OMP
from the same shell that you run urmap.
32-bit usearch uses similar OMP code; making a small usearch test could be a quicker and simpler probe of your OMP settings compared to running urmap (no long load time for the index).
Thanks for this! I will compile from source tomorrow and also make sure the index and alignment job is run on the same computing node (currently I copy the index to the computing node of the alignment job).
If it is because of the copying of index, it is strange that all the 300nt read instances segfaults, while all the 100,150,200, and 250nt instances completes without segfault. Anyway, will have a look tomorrow.
I don't recommend recompiling from source -- I recommend using the precompiled binary because I just verified (1) that it supports OMP multi-threading and (2) if I build the index then run the mapping test on your reads it does not segfault. If you get different results with the same binary, then it must be something different about your machine in which case I would probably need ssh access to trouble-shoot. I agree it's puzzling that the shorter reads don't crash, but I need to reproduce a bug to be able to analyze it.
I verified that I have the same binary. I do
wget https://github.com/rcedgar/urmap/raw/master/bin/urmap
chmod +x urmap
$ ./urmap
urmap v1.0.1480_i86linux64, 264Gb RAM, 40 cores
(C) Copyright 2019 Robert C. Edgar
https://drive5.com/urmap
OMP environment:
$ env | grep OMP
CONDA_PROMPT_MODIFIER=(base)
I start a job on a node as:
urmap -make_ufi Homo_sapiens.GRCh38.dna.primary_assembly.fa -output tmp_index.ufi
urmap -map2 300_L.fq -reverse 300_R.fq -ufi tmp_index.ufi -samout aligned.sam
This is the output (I run all jobs through the Snakemake workflow).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 urmap_align
1
[Mon Dec 13 09:25:58 2021]
rule urmap_align:
input: /proj/snic2020-16-138/strobemap_eval/reads_PE/SIM3/300_L.fq, /proj/snic2020-16-138/strobemap_eval/reads_PE/SIM3/300_R.fq, /proj/snic2020-16-138/ultra_eval/genomes/Homo_sapiens.GRCh38.dna.primary_assembly_modified_headers.fa.ufi
output: /proj/snic2020-16-138/strobemap_eval/time_and_mem/urmap/16/300/aligning_time_and_mem.txt, /proj/snic2020-16-138/strobemap_eval/alignments/urmap/16/300.sam
jobid: 0
wildcards: n_threads=16, read_length=300
Copied reads to: 23738026
urmap v1.0.1480_i86linux64, 131Gb RAM, 20 cores
(C) Copyright 2019 Robert C. Edgar
https://drive5.com/urmap
00:00 3.1Mb
00:00 3.1Mb Genome size 3.2G
00:00 3.1Mb Slots 5392814809 (5.4G)
00:00 3.1Mb Load factor 0.58
00:00 3.1Mb Word length 24
00:00 3.1Mb Max abund. 32
00:00 3.1Mb
00:24 3.5Gb 100% Reading Homo_sapiens.GRCh38.dna.primary_assembly_modified_headers.fa
00:32 3.4Gb Converting to upper case
00:35 3.4Gb Initialize seqs
00:57 30.4Gb Init vectors
01:00 35.8Gb Init slot counts
03:39 35.8Gb Count slots
03:42 41.2Gb Init minus slot counts
06:23 41.2Gb Count slots minus
15:51 41.2Gb Index
15:51 41.2Gb 0 slots truncated
18:08 41.2Gb Writing /scratch/23738026/urmap_index.ufi
19:14 41.2Gb Exiting
Command being timed: "urmap -make_ufi /proj/snic2020-16-138/ultra_eval/genomes/Homo_sapiens.GRCh38.dna.primary_assembly_modified_headers.fa -output /scratch/23738026/urmap_index.ufi"
User time (seconds): 851.02
System time (seconds): 139.41
Percent of CPU this job got: 85%
Elapsed (wall clock) time (h:mm:ss or m:ss): 19:16.73
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 40147516
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3
Minor (reclaiming a frame) page faults: 143158456
Voluntary context switches: 102794
Involuntary context switches: 1297
Swaps: 0
File system inputs: 1053640
File system outputs: 58719936
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Created index in: 23738026
urmap v1.0.1480_i86linux64, 131Gb RAM, 20 cores
(C) Copyright 2019 Robert C. Edgar
https://drive5.com/urmap
00:18 30.1Gb 100% Reading index urmap_index.ufi
00:18 30.1Gb CPU has 20 cores, defaulting to 10 threads
WARNING: Max OMP threads 1
00:18 30.1Gb 0.00% Mapping (paired) reads_L.fqCommand terminated by signal 11
Command being timed: "urmap -map2 /scratch/23738026/reads_L.fq -reverse /scratch/23738026/reads_R.fq -ufi /scratch/23738026/urmap_index.ufi -samout /scratch/23738026/aligned.sam"
User time (seconds): 0.02
System time (seconds): 19.45
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:19.48
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 29426964
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 7371003
Voluntary context switches: 20
Involuntary context switches: 49
Swaps: 0
File system inputs: 0
File system outputs: 104
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
[Mon Dec 13 09:45:49 2021]
Error in rule urmap_align:
jobid: 0
output: /proj/snic2020-16-138/strobemap_eval/time_and_mem/urmap/16/300/aligning_time_and_mem.txt, /proj/snic2020-16-138/strobemap_eval/alignments/urmap/16/300.sam
RuleException:
CalledProcessError in line 262 of /domus/h1/kris/source/alignment_evaluation/evaluation_multithread/Snakefile:
Command 'set -o pipefail; /usr/bin/time -v urmap -map2 /scratch/$SLURM_JOB_ID/reads_L.fq -reverse /scratch/$SLURM_JOB_ID/reads_R.fq -ufi /scratch/$SLURM_JOB_ID/urmap_index.ufi -samout /scratch/$SLURM_JOB_ID/aligned.sam 2>&1 | tee /scratch/$SLURM_JOB_ID/aligning_time_and_mem.txt' returned non-zero exit status 11.
File "/proj/snic2020-15-201/anaconda3/envs/scisoseq-eval/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2318, in run_wrapper
File "/domus/h1/kris/source/alignment_evaluation/evaluation_multithread/Snakefile", line 262, in __rule_urmap_align
File "/proj/snic2020-15-201/anaconda3/envs/scisoseq-eval/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 560, in _callback
File "/proj/snic2020-15-201/anaconda3/envs/scisoseq-eval/lib/python3.9/concurrent/futures/thread.py", line 52, in run
File "/proj/snic2020-15-201/anaconda3/envs/scisoseq-eval/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 546, in cached_or_run
File "/proj/snic2020-15-201/anaconda3/envs/scisoseq-eval/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2349, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
creating /proj/snic2020-16-138/strobemap_eval/alignments/urmap/16/
As for USEARCH32, I also get it is limited in CPU usage, but I don't get the OMP warning:
$ wget https://drive5.com/downloads/usearch11.0.667_i86linux32.gz
$ gunzip usearch11.0.667_i86linux32.gz
$ chmod +x usearch11.0.667_i86linux32
$ ./usearch11.0.667_i86linux32
usearch v11.0.667_i86linux32, 4.0Gb RAM (264Gb total), 40 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
License: personal use only
Clustering of 50k 300nt (short read) sequences:
usearch v11.0.667_i86linux32, 4.0Gb RAM (264Gb total), 20 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
License: personal use only
00:00 73Mb 100.0% Reading 50k_reads_R.fq
00:00 40Mb CPU has 20 cores, defaulting to 10 threads
00:00 76Mb 100.0% DF
00:00 77Mb 50000 seqs, 50000 uniques, 50000 singletons (100.0%)
00:00 77Mb Min size 1, median 1, max 1, avg 1.00
00:00 75Mb 100.0% DB
00:16 189Mb 100.0% 48364 clusters, max size 32, avg 1.0
00:16 189Mb 100.0% Writing centroids to nr.fasta
Seqs 50000 (50.0k)
Clusters 48364 (48.4k)
Max size 32
Avg size 1.0
Min size 1
Singletons 47850 (47.9k), 95.7% of seqs, 98.9% of clusters
Max mem 189Mb
Time 16.0s
Throughput 3125.0 seqs/sec.
Command being timed: "./usearch11.0.667_i86linux32 -cluster_fast 50k_reads_R.fq -id 0.9 -centroids nr.fasta"
User time (seconds): 16.01
System time (seconds): 0.27
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.67
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 145868
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 4
Minor (reclaiming a frame) page faults: 121336
Voluntary context switches: 1368
Involuntary context switches: 51
Swaps: 0
File system inputs: 66728
File system outputs: 30520
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I will put the full 300 reads dataset on my dropbox to see if you can reproduce the segfault with urmap. I will contact you over email with the link.
You should now have received a dropbox link to a dataset for which URMAP fails on the server I'm running experiments on.
Worked ok for me, I re-built the index with your FASTA then mapping ran to completion with 96.9% reads mapped at Q>=10. How much RAM does your server have?
Very strange. I reran the job again, not as part of the Snakemake pipeline but as a single submitted job to a node. The nodes have 20 cores and 131Gb of memory (can be seen in the specs below). It gives a 5882 Segmentation fault
as seen below. I have pasted both the script and the output below.
I'm running the jobs on a centralized server for academic computing jobs UPPMAX. Getting an account and setup here, and additionally applying for membership in my project is not a quick process from what I remember setting my account up. Up to you. I will send my project number in an email if you do decide on this process.
Perhaps I can try it on some other server through collaboration (in the future..), and in the meantime, we can let this rest and see if I come up for someone else?
As for the results I tweeted about, and perhaps more important; upcoming analysis in a paper. I could either keep the accuracy, memory usage, and runtime for one core (which is an analysis I also have for all aligners) but exclude the multiprocessing and make a note (referring to this issue). I would also make a note about the 300 real lengths instances referring to this issue. Or I could leave out URMAP completely from the analysis. Which one do you prefer?
Script:
$ cat test_urmap.sh
#!/bin/sh
#SBATCH -A snic2020-5-651
#SBATCH --time=02:00:00
#SBATCH -p node
#SBATCH -n 1
#SBATCH -C mem128GB
# Memory per node specification is in MB. It is optional.
# The default limit is 3000MB per core.
#SBATCH --job-name="test_urmap"
#SBATCH --mail-user=XXX
#SBATCH --mail-type=ALL
#conda init bash
set -o errexit
tmp_index="/scratch/$SLURM_JOB_ID/urmap_index.ufi"
genome="/proj/snic2020-16-138/ultra_eval/genomes/Homo_sapiens.GRCh38.dna.primary_assembly_modified_headers.fa"
tmp_folder="/scratch/$SLURM_JOB_ID/"
urmap -make_ufi $genome -output $tmp_index
urmap -map2 /proj/snic2020-16-138/strobemap_eval/reads_PE/SIM3/300_L.fq -reverse /proj/snic2020-16-138/strobemap_eval/reads_PE/SIM3/300_R.fq -ufi $tmp_index -samout $tmp_folder/aligned.sam
cp $tmp_folder/aligned.sam aligned_urmap.sam
Output:
$ cat slurm-23745045.out
urmap v1.0.1480_i86linux64, 131Gb RAM, 20 cores
(C) Copyright 2019 Robert C. Edgar
https://drive5.com/urmap
00:00 3.1Mb
00:00 3.1Mb Genome size 3.2G
00:00 3.1Mb Slots 5392814809 (5.4G)
00:00 3.1Mb Load factor 0.58
00:00 3.1Mb Word length 24
00:00 3.1Mb Max abund. 32
00:00 3.1Mb
00:24 3.5Gb 100% Reading Homo_sapiens.GRCh38.dna.primary_assembly_modified_headers.fa
00:31 3.4Gb Converting to upper case
00:33 3.4Gb Initialize seqs
00:51 30.4Gb Init vectors
00:54 35.8Gb Init slot counts
03:27 35.8Gb Count slots
03:30 41.2Gb Init minus slot counts
06:03 41.2Gb Count slots minus
15:09 41.2Gb Index
15:09 41.2Gb 0 slots truncated
16:44 41.2Gb Writing /scratch/23745045/urmap_index.ufi
17:48 41.2Gb Exiting
urmap v1.0.1480_i86linux64, 131Gb RAM, 20 cores
(C) Copyright 2019 Robert C. Edgar
https://drive5.com/urmap
00:17 30.1Gb 100% Reading index urmap_index.ufi
00:17 30.1Gb CPU has 20 cores, defaulting to 10 threads
00:18 30.8Gb 0.54% Mapping (paired) 300_L.fq/var/spool/slurmd/job23745045/slurm_script: line 22: 5882 Segmentation fault (core dumped) urmap -map2 /proj/snic2020-16-138/strobemap_eval/reads_PE/SIM3/300_L.fq -reverse /proj/snic2020-16-138/strobemap_eval/reads_PE/SIM3/300_R.fq -ufi $tmp_index -samout $tmp_folder/aligned.sam
And here are the details on the computing nodes and processors:
Rackham consists of 334 compute servers (nodes) where each compute server
consists of two 10-core Xeon E5-2630 V4 processors running at 2.2 GHz (turbo 3.1 GHz).
We provide 272 nodes with 128 GB memoryi (r33-r304) and 32 nodes with 256 GB (r1-r32).
All nodes are interconnected with a 2:1 oversubscribed FDR (56 GB/s) Infiniband fabric.
In total Rackham provides 6080 CPU cores in compute nodes.
Some job infro for the last job (Seems to have reached peak memory 38Gb, probably during indexing)
2021-12-13 22:58:42 jobid=23745045 jobstate=FAILED nodes=r53 procs=20 partition=node
qos=normal jobname=test_urmap maxmemory_in_GiB=38.3 maxmemory_node=r53
submit_time=2021-12-13T22:27:56 runtime=00:18:12
Hello all,
I can help to double check. I do not have any problems with a few dafaset except the NM tag problem. Maybe you could also help with it(no NM tag for identity estimation)? Plus I do not see how many cores you requst in sbatch script. Maybe also try in an interactive mode?
jianshu.zhao@gatech.edu
thanks,
Jianshu
Thanks for helping out @jianshu93! I have now shared the link to the data set on google drive.
I specified
#SBATCH -p node
#SBATCH -n 1
This means partition given in the -n
parameter (node), and the node has 20 cores. To specify only one core one does
#SBATCH -p core
#SBATCH -n 1
details under 'Details about the "core" and "node" partitions' here
Hi @rcedgar,
I ran URMAP in my benchmark on short-read aligners and noticed that, while it is very fast, it has low CPU usage (between 10% and 90%) regardless of
-threads N
withN=1 4, 8, or 16
. I ran URMAP as follows:I also copied reads and index to the local compute node before aligning to minimize disk latency.
Some details in this twitter thread: https://twitter.com/krsahlin/status/1469327435103621127
Do you have an idea of why that is?
Also, I noticed that URMAP fails with a segfault on all my instances with read lengths of 2*300nt PE reads.
Best, Kristoffer