Closed sebhtml closed 10 years ago
Sequences
sebhtml@titan-ext1:~/lsc005/projects/human-1-hour> cat HiSeq-2500-NA12878-demo-2x150-3/FilePartition.txt
#File Name FirstSequence LastSequence NumberOfSequences
0 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_001.fastq.gz 0 143818692 143818693
1 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_001.fastq.gz 143818693 287637385 143818693
2 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_002.fastq.gz 287637386 437610805 149973420
3 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_002.fastq.gz 437610806 587584225 149973420
4 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_001.fastq.gz 587584226 731879531 144295306
5 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_001.fastq.gz 731879532 876174837 144295306
6 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_002.fastq.gz 876174838 1023766068 147591231
7 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_002.fastq.gz 1023766069 1171357299 147591231
Each titan node has: ( https://www.olcf.ornl.gov/support/system-user-guides/titan-user-guide/ )
16 cores 32 GiB ram
NVIDIA KEPLER
313 * 16 = 5008 MPI ranks
32 GiB / 16 cores = 2 GiB.
Latency is very high:
sebhtml@titan-ext1:~/lsc005/projects/human-1-hour> head HiSeq-2500-NA12878-demo-2x150-3/NetworkTest.txt
# average and mode round trip latency in microseconds (10^-6 seconds) when requesting a reply for a message of 4000 bytes
# MessagePassingInterfaceRank Name ModeLatencyInMicroseconds AverageLatencyInMicroseconds NumberOfExchanges
# AverageForAllRanks: 299.679
# StandardDeviation: 31.685
0 nid12147 30 116 1000
1 nid12147 34 313 1000
2 nid12147 148 279 1000
3 nid12147 184 298 1000
4 nid12147 28 287 1000
5 nid12147 30 301 1000
memory usage is at 3 GiB+ when Ray starts (?)
sebhtml@titan-ext1:~/lsc005/projects/human-1-hour> grep memory HiSeq-2500-NA12878-demo-2x150-3.o1732882|head
Rank 77: assembler memory usage: 3251836 KiB
Rank 78: assembler memory usage: 3251836 KiB
Rank 77: assembler memory usage: 3317568 KiB
Rank 78: assembler memory usage: 3317568 KiB
Rank 63: assembler memory usage: 3251836 KiB
Rank 51: assembler memory usage: 3251836 KiB
Rank 3861: assembler memory usage: 3251836 KiB
Rank 1645: assembler memory usage: 3251836 KiB
Rank 1639: assembler memory usage: 3251836 KiB
Rank 51: assembler memory usage: 3317568 KiB
Every machine has 16 MPI ranks:
sebhtml@titan-ext1:~/lsc005/projects/human-1-hour> grep -v ^# HiSeq-2500-NA12878-demo-2x150-3/NetworkTest.txt | awk '{print $2}'|sort|uniq -c|wc -l
313
error messages:
MPICH2 ERROR [Rank 1227] [job id 3577704] [Mon Sep 16 20:34:24 2013] [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): Unable to mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.2.27853.kvs_3577704, err Cannot allocate memory
MPICH2 ERROR [Rank 1227] [job id 3577704] [Mon Sep 16 20:34:24 2013] [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): large page stats: free 0 nr 158 nr_overcommit 16154 resv 0 surplus 158
MPICH2 ERROR [Rank 1230] [job id 3577704] [Mon Sep 16 20:34:24 2013] [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): Unable to mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.2.27856.kvs_3577704, err Cannot allocate memory
MPICH2 ERROR [Rank 1230] [job id 3577704] [Mon Sep 16 20:34:24 2013] [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): large page stats: free 0 nr 165 nr_overcommit 16154 resv 0 surplus 165
MPICH2 ERROR [Rank 4378] [job id 3577704] [Mon Sep 16 20:34:24 2013] [c0-2c1s6n0] [nid00114] - MPIU_nem_gni_get_hugepages(): Unable to mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.2.24160.kvs_3577704, err Cannot allocate memory
MPICH2 ERROR [Rank 4378] [job id 3577704] [Mon Sep 16 20:34:24 2013] [c0-2c1s6n0] [nid00114] - MPIU_nem_gni_get_hugepages(): large page stats: free 0 nr 173 nr_overcommit 16154 resv 0 surplus 173
report info at tick = 0 and add VmRSS. to -debug
Add all of these on Linux:
VmPeak: 108964 kB VmSize: 108960 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 872 kB VmRSS: 872 kB VmData: 196 kB VmStk: 140 kB VmExe: 132 kB VmLib: 1992 kB VmPTE: 60 kB VmSwap: 0 kB
To build it:
module purge module load PrgEnv-intel/4.1.40 module load cray-mpich2/5.6.3 make MPICXX=CC CXXFLAGS="-xHOST -O3 -static" -j 4 HAVE_LIBZ=y clean make MPICXX=CC CXXFLAGS="-xHOST -O3 -static" -j 4 HAVE_LIBZ=y
iteration 4:
sebhtml@titan-ext3:/tmp/proj/lsc005/projects/human-1-hour> cat HiSeq-2500-NA12878-demo-2x150-4.sh
#PBS -N HiSeq-2500-NA12878-demo-2x150-4
#PBS -l walltime=3:00:00
#PBS -l nodes=313
#PBS -A LSC005
#PBS -l gres=widow1
cd $PBS_O_WORKDIR
# 313 * 8 * 2 = 5008
aprun -n 5008 -S 8 \
./software/lsc005/Ray/c610ae8670e1627bc41a64bbde18ac8f658b131f-1/Ray \
-k 31 \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
-o HiSeq-2500-NA12878-demo-2x150-4 \
sebhtml@titan-ext3:/tmp/proj/lsc005/projects/human-1-hour> qsub HiSeq-2500-NA12878-demo-2x150-4.sh
1742436
sebhtml@titan-ext3:/tmp/proj/lsc005/projects/human-1-hour> showq | grep 1742436
1742436 sebhtml Idle 5008 3:00:00 Thu Sep 26 15:53:42
Needs -debug:
sebhtml@titan-ext3:/tmp/proj/lsc005/projects/human-1-hour> cat HiSeq-2500-NA12878-demo-2x150-4.sh
#PBS -N HiSeq-2500-NA12878-demo-2x150-4
#PBS -l walltime=3:00:00
#PBS -l nodes=313
#PBS -A LSC005
#PBS -l gres=widow1
cd $PBS_O_WORKDIR
# 313 * 8 * 2 = 5008
aprun -n 5008 -S 8 \
./software/lsc005/Ray/c610ae8670e1627bc41a64bbde18ac8f658b131f-1/Ray \
-debug \
-k 31 \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
-o HiSeq-2500-NA12878-demo-2x150-4 \
sebhtml@titan-ext3:/tmp/proj/lsc005/projects/human-1-hour> qsub HiSeq-2500-NA12878-demo-2x150-4.sh
1742711
iteration 5:
Carlos P. Sosa told me to use aprun -n 5008 -N 16
titan> cat HiSeq-2500-NA12878-demo-2x150-5.sh
#PBS -N HiSeq-2500-NA12878-demo-2x150-5
#PBS -l walltime=12:00:00
#PBS -l nodes=313
#PBS -A LSC005
#PBS -l gres=widow1
cd $PBS_O_WORKDIR
# 313 * 8 * 2 = 5008
aprun -n 5008 -N 16 \
./software/lsc005/Ray/c610ae8670e1627bc41a64bbde18ac8f658b131f-1/Ray \
-debug \
-k 31 \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
-o HiSeq-2500-NA12878-demo-2x150-5 \
titan> qsub HiSeq-2500-NA12878-demo-2x150-5.sh
1747928
With -debug:
titan> cat HiSeq-2500-NA12878-demo-2x150-7.sh
#PBS -N HiSeq-2500-NA12878-demo-2x150-7
#PBS -l walltime=12:00:00
#PBS -l nodes=313
#PBS -A LSC005
#PBS -l gres=widow1
cd $PBS_O_WORKDIR
# 313 * 8 * 2 = 5008
#-debug \
aprun -n 5008 -N 16 \
./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \
-debug \
-k 31 \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
-o HiSeq-2500-NA12878-demo-2x150-7 \
titan> qsub HiSeq-2500-NA12878-demo-2x150-7.sh
1763643
In 4 hours, Ray loads data, builds the graph, compute libraries and traverse the graph.
titan> cat HiSeq-2500-NA12878-demo-2x150-8/ElapsedTime.txt
#Step Date Elapsed time Since Beginning
Network testing 2013-10-26T22:03:57 12 seconds 12 seconds
Counting sequences to assemble 2013-10-26T22:20:12 16 minutes, 15 seconds 16 minutes, 27 seconds
Sequence loading 2013-10-26T23:28:44 1 hours, 8 minutes, 32 seconds 1 hours, 24 minutes, 59 seconds
K-mer counting 2013-10-26T23:34:33 5 minutes, 49 seconds 1 hours, 30 minutes, 48 seconds
Coverage distribution analysis 2013-10-26T23:34:40 7 seconds 1 hours, 30 minutes, 55 seconds
Graph construction 2013-10-26T23:44:11 9 minutes, 31 seconds 1 hours, 40 minutes, 26 seconds
Null edge purging 2013-10-26T23:46:01 1 minutes, 50 seconds 1 hours, 42 minutes, 16 seconds
Selection of optimal read markers 2013-10-27T00:03:12 17 minutes, 11 seconds 1 hours, 59 minutes, 27 seconds
Detection of assembly seeds 2013-10-27T00:09:52 6 minutes, 40 seconds 2 hours, 6 minutes, 7 seconds
Estimation of outer distances for paired reads 2013-10-27T00:11:58 2 minutes, 6 seconds 2 hours, 8 minutes, 13 seconds
Bidirectional extension of seeds 2013-10-27T02:07:27 1 hours, 55 minutes, 29 seconds 4 hours, 3 minutes, 42 seconds
As expected, the merging must be improved.
titan> tail HiSeq-2500-NA12878-demo-2x150-8/NumberOfSequences.txt
FilePath: HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_002.fastq.gz
NumberOfSequences: 147591231
FirstSequence: 1023766069
LastSequence: 1171357299
Summary
NumberOfSequences: 1171357300
FirstSequence: 0
LastSequence: 1171357299
Let's do a bigger job now !!!
script for the 4 hours incomplete run:
titan> cat HiSeq-2500-NA12878-demo-2x150-8.sh
#PBS -N HiSeq-2500-NA12878-demo-2x150-8
#PBS -l walltime=12:00:00
#PBS -l nodes=313
#PBS -A LSC005
#PBS -l gres=widow1
cd $PBS_O_WORKDIR
# 313 * 8 * 2 = 5008
# 313 * 8 * 1 = 2504
#-debug \
aprun -n 2504 \
./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \
-k 31 \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
-o HiSeq-2500-NA12878-demo-2x150-8 \
with 3750 nodes, I can run 24 hours !
https://www.olcf.ornl.gov/kb_articles/titan-scheduling-policy/
accounting: 3750_30_24 = 2700000 (we don't have enough for this)
we have 250000 for this fall
Let's try with 3750 nodes, 8 ranks per node, with 30000 ranks.
titan> cat HiSeq-2500-NA12878-demo-2x150-9.sh
#PBS -N HiSeq-2500-NA12878-demo-2x150-9
#PBS -l walltime=00:12:00:00
#PBS -l nodes=626
#PBS -A LSC005
#PBS -l gres=widow1
cd $PBS_O_WORKDIR
# 626 * 8 = 5008
aprun -n 5008 \
./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \
-k 31 \
-detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \
-o HiSeq-2500-NA12878-demo-2x150-9 \
titan> qsub HiSeq-2500-NA12878-demo-2x150-9.sh
1769459
job -9 vanished, that's strange:
titan> showq | grep boisv titan> ls|grep HiSeq-2500-NA12878-demo-2x150-9 HiSeq-2500-NA12878-demo-2x150-9.sh
Let's resubmit as -10:
titan> cat HiSeq-2500-NA12878-demo-2x150-10.sh
cd $PBS_O_WORKDIR
aprun -n 5008 \ ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \ -k 31 \ -detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \ -o HiSeq-2500-NA12878-demo-2x150-10 \
titan> qsub HiSeq-2500-NA12878-demo-2x150-10.sh 1778289
Waiting time:
Salut Jacques,
Mes jobs sont en attente, respectivement depuis 18 et 8 jours.
titan> showq | grep sebht 1769459 sebhtml Idle 10016 12:00:00 Mon Oct 28 14:19:19 1778289 sebhtml Idle 10016 12:00:00 Thu Nov 7 11:17:21
titan> checkjob 1769459|head job 1769459
AName: HiSeq-2500-NA12878-demo-2x150-9 State: Idle Creds: user:sebhtml group:sebhtml account:LSC005 class:batch qos:bin0 WallTime: 00:00:00 of 12:00:00 BecameEligible: Fri Nov 15 12:27:27 SubmitTime: Mon Oct 28 14:19:19 (Time Queued Total: 18:00:01:21 Eligible: 17:21:30:02)
titan> checkjob 1778289|head
job 1778289
AName: HiSeq-2500-NA12878-demo-2x150-10 State: Idle Creds: user:sebhtml group:sebhtml account:LSC005 class:batch qos:bin0 WallTime: 00:00:00 of 12:00:00 BecameEligible: Fri Nov 15 12:27:27 SubmitTime: Thu Nov 7 11:17:21 (Time Queued Total: 8:02:03:34 Eligible: 7:23:39:24)
on Trillian (UNH Cray XE6, http://trillian-use.sr.unh.edu/index.php/Main_Page) does not like this make command
make MPICXX=CC CXXFLAGS="-xHOST -O3 -static" -j 4 HAVE_LIBZ=y
It complains that -xHOST is an invalid command line flag.
Did you ever solve the latency issue- I see high latency here, too.
Which compiler are you using. -xHOST is with the Intel compiler I think.
Update for jobs -9 and -10:
Hi Jacques,
Regarding titan:
My best shot so far:
"In 4 hours, Ray loads data, builds the graph, compute libraries and traverse the graph." (2013-10-28)
Last 2 jobs
However, my last 2 jobs both failed (I increased the number of cores and this highlighted the same problem in the caching subsystem of nodes).
Job: HiSeq-2500-NA12878-demo-2x150-9 # 1769459
titan> cat HiSeq-2500-NA12878-demo-2x150-9.sh
cd $PBS_O_WORKDIR
aprun -n 5008 \ ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \ -k 31 \ -detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \ -o HiSeq-2500-NA12878-demo-2x150-9 \
Like last time, this is a problem with cached content in the VFS layer of Lustre.
MPIU_nem_gni_get_hugepages(): Unable to mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pa gesize-2097152/hugepagefile.MPICH.2.16799.kvs_3928360, err Cannot allocate memory
Job: HiSeq-2500-NA12878-demo-2x150-10 # 1778289
titan> cat HiSeq-2500-NA12878-demo-2x150-10.sh
cd $PBS_O_WORKDIR
aprun -n 5008 \ ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \ -k 31 \ -detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \ -o HiSeq-2500-NA12878-demo-2x150-10 \
Same here:
MPIU_nem_gni_get_hugepages(): Unable to mmap 12582912 bytes for file /var/lib/hugetlbfs/global/ pagesize-2097152/hugepagefile.MPICH.2.23067.kvs_3999472, err Cannot allocate memory
What support says about this
The ticket with ORNL people is "Re: [CCS #177295] MPICH on titan uses a lot of memory (?)".
The last response I got was from 2013-10-21:
Thanks Sebastien,
My gut tells me you're running out of memory per core. Hugepage is busting and the max size is 2GB. MPIU_nem_gni_get_hugepages(): large page stats: free 0 nr 211 nr_overcommit 16154 resv 0 surplus 211
The network is just the one to complain about it, but not necessarily the cause.
Have you tried lowering the number of MPI processes to 8/node?
FF
(I am already at 8, I think the problem is buggy caching, not memory usage by Ray).
The issue is that cached pages in the VFS wastes memory.
See below the /proc/meminfo:
[Rank 3758] Cat of /proc/meminfo [Rank 3755]: MemTotal: 33084652 kB [Rank 3755]: MemFree: 3984520 kB [Rank 3755]: Buffers: 0 kB [Rank 3755]: Cached: 22332700 kB ** [Rank 3755]: SwapCached: 0 kB [Rank 3755]: Active: 12556068 kB [Rank 3755]: Inactive: 12527116 kB [Rank 3758]: MemTotal: 33084652 kB [Rank 3755]: Active(anon): 2637848 kB [Rank 3758]: MemFree: 3984892 kB [Rank 3755]: Inactive(anon): 168920 kB [Rank 3758]: Buffers: 0 kB [Rank 3755]: Active(file): 9918220 kB [Rank 3758]: Cached: 22332700 kB [Rank 3755]: Inactive(file): 12358196 kB
That's somewhere between 22 gigabytes and 36 gigabytes wasted on cache by the operating system.
Ticket: https://github.com/sebhtml/ray/issues/197
Séb
Support said to try out the new storage:
moving files to atlas.
titan> mv /tmp/proj/lsc005/* /lustre/atlas/proj-shared/lsc005/
job with atlas on titan:
titan> pwd /lustre/atlas/proj-shared/lsc005/projects/human-1-hour titan> cat HiSeq-2500-NA12878-demo-2x150-11.sh
cd $PBS_O_WORKDIR
aprun -n 5008 \ ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \ -k 31 \ -detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \ -o HiSeq-2500-NA12878-demo-2x150-11 \
titan> qsub HiSeq-2500-NA12878-demo-2x150-11.sh 1833464
titan> showq | grep 1833464 1833464 sebhtml Idle 10016 12:00:00 Mon Jan 6 11:43:18
I think this will start in like 1 month.
on titan: #228
-11 failed because of a faulty symlink...
Hi Jacques,
For my Titan job, it seems that it started after the decommissioning of Spider, which was on 27 Jan 2014 I think.
There was a faulty symbolic link. although my data was on Atlas.
titan> pwd /ccs/home/sebhtml/lsc005-atlas/projects/human-1-hour titan> cat HiSeq-2500-NA12878-demo-2x150-11.e1833464 aprun: file ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray not found aprun: Exiting due to errors. Application aborted
titan> readlink software lsc005/software/ titan> readlink lsc005 /tmp/proj/lsc005 titan> file /tmp/proj/lsc005 /tmp/proj/lsc005: cannot open `/tmp/proj/lsc005' (No such file or directory)
titan> file ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), for GNU/Linux 2.6.4, statically linked, not stripped
-12
titan> vim HiSeq-2500-NA12878-demo-2x150-12.sh titan> qsub HiSeq-2500-NA12878-demo-2x150-12.sh 1863329 titan> pwd /ccs/home/sebhtml/lsc005/projects/human-1-hour
Corrupted files on Titan (Atlas FS):
7 out of 8 fastq files vanished (strange). This is what is left of it:
Fichiers sur Titan (il y a eu un problème sur le FS):
titan> ls -lh HiSeq-2500-NA12878-demo-2x150/*gz -rw------- 1 sebhtml lsc005 684M 2013-12-23 12:30 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_001.fastq.gz
Ce que c'est sensé être:
[boisver1@ip03-mp2 data]$ ls -lh HiSeq-2500-NA12878-demo-2x150/*gz -rw-rwxr-- 1 boisver1 corbeil 18G Nov 21 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_001.fastq.gz -rw-rwxr-- 1 boisver1 corbeil 19G Nov 21 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R1_002.fastq.gz -rw-rwxr-- 1 boisver1 corbeil 19G Nov 21 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_001.fastq.gz -rw-rwxr-- 1 boisver1 corbeil 19G Nov 22 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L001_R2_002.fastq.gz -rw-rwxr-- 1 boisver1 corbeil 18G Nov 21 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_001.fastq.gz -rw-rwxr-- 1 boisver1 corbeil 18G Nov 21 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_002.fastq.gz -rw-rwxr-- 1 boisver1 corbeil 19G Nov 21 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_001.fastq.gz -rw-rwxr-- 1 boisver1 corbeil 19G Nov 21 2012 HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R2_002.fastq.gz
oh well...
Pulling from Sherbrooke to get the data again.
rsync -avzPL
/mnt/scratch_mp2/corbeil/corbeil_group/nne-790-ab/data/HiSeq-2500-NA12878-demo-2x150
Data on Titan:
titan> ls /lustre/atlas/proj-shared/lsc005/projects/human-1-hour/HiSeq-2500-NA12878-demo-2x150/ -lh total 145G -rw-rwxr-- 1 sebhtml sebhtml 946 2012-11-21 15:16 11 -rw-rwxr-- 1 sebhtml sebhtml 1009 2012-11-21 15:16 12 -rw-rwxr-- 1 sebhtml sebhtml 328 2012-11-22 18:44 Counts -rw-rwxr-- 1 sebhtml sebhtml 291 2012-11-22 13:52 Get.sh -rw-rwxr-- 1 sebhtml sebhtml 889 2012-11-20 00:38 RawFiles.txt -rw-r--r-- 1 sebhtml sebhtml 14 2012-11-21 13:10 README -rw-rwxr-- 1 sebhtml sebhtml 584 2012-11-22 14:06 sha1sum.txt -rw-rwxr-- 1 sebhtml sebhtml 18G 2012-11-21 15:17 sorted_S1_L001_R1_001.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 523 2012-11-22 13:52 sorted_S1_L001_R1_001.fastq.gz.log -rw-rwxr-- 1 sebhtml sebhtml 19G 2012-11-21 16:44 sorted_S1_L001_R1_002.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 523 2012-11-22 13:52 sorted_S1_L001_R1_002.fastq.gz.log -rw-rwxr-- 1 sebhtml sebhtml 19G 2012-11-21 15:17 sorted_S1_L001_R2_001.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 602 2012-11-22 13:52 sorted_S1_L001_R2_001.fastq.gz.log -rw-rwxr-- 1 sebhtml sebhtml 19G 2012-11-22 11:22 sorted_S1_L001_R2_002.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 523 2012-11-22 13:52 sorted_S1_L001_R2_002.fastq.gz.log -rw-rwxr-- 1 sebhtml sebhtml 18G 2012-11-21 16:38 sorted_S1_L002_R1_001.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 602 2012-11-22 13:52 sorted_S1_L002_R1_001.fastq.gz.log -rw-rwxr-- 1 sebhtml sebhtml 18G 2012-11-21 16:07 sorted_S1_L002_R1_002.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 523 2012-11-22 13:52 sorted_S1_L002_R1_002.fastq.gz.log -rw-rwxr-- 1 sebhtml sebhtml 19G 2012-11-21 19:16 sorted_S1_L002_R2_001.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 523 2012-11-22 13:52 sorted_S1_L002_R2_001.fastq.gz.log -rw-rwxr-- 1 sebhtml sebhtml 19G 2012-11-21 15:29 sorted_S1_L002_R2_002.fastq.gz -rw-rwxr-- 1 sebhtml sebhtml 523 2012-11-22 13:52 sorted_S1_L002_R2_002.fastq.gz.log
new executable /lustre/atlas/proj-shared/lsc005/software/lsc005/Ray/53a80be6905565c7f791d069f9a1bf2e82ea8132-1/Ray
-13
titan> pwd /ccs/home/sebhtml/lsc005/projects/human-1-hour titan> cat HiSeq-2500-NA12878-demo-2x150-13.sh
cd $PBS_O_WORKDIR
aprun -n 5008 \ ./software/lsc005/Ray/53a80be6905565c7f791d069f9a1bf2e82ea8132-1/Ray \ -k 31 \ -detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \ -o HiSeq-2500-NA12878-demo-2x150-13 \
titan> qsub HiSeq-2500-NA12878-demo-2x150-13.sh 1867708
titan> showq | grep sebhtml 1867708 sebhtml Idle 10016 12:00:00 Wed Feb 12 16:39:47
For job HiSeq-2500-NA12878-demo-2x150-13 (Atlas)
MPICH2 ERROR [Rank 4] [job id 4468454] [Wed Feb 12 19:16:49 2014] [c6-4c2s3n3] [nid02823] - MPIU_nem_gni_get_hugepages(): Unable to mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize-2097152/hugepag efile.MPICH.2.2794.kvs_4468454, err Cannot allocate memory
titan> grep Cached HiSeq-2500-NA12878-demo-2x150-13.e1867708|head -n1 [Rank 4]: Cached: 14777104 kB
This project is finished.