sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

high VmData when starting processes on colosse #209

Closed sebhtml closed 10 years ago

sebhtml commented 10 years ago

Stack:

module use /rap/nne-790-ab/modulefiles module load nne-790-ab/ray/857b773c98e1aa4e9aa86333ab786e52f443af4c-1

module load tools/make/3.82 module load compilers/gcc/4.8.0 module load apps/blcr/0.8.4 module load mpi/openmpi/1.6.4_gcc

when starting the job:

$ head meminfo MemTotal: 24735700 kB MemFree: 6231944 kB Buffers: 0 kB Cached: 10921432 kB SwapCached: 0 kB Active: 13089672 kB Inactive: 1939548 kB Active(anon): 4117216 kB Inactive(anon): 27180 kB Active(file): 8972456 kB

$ head top top - 11:32:49 up 9 days, 20:34, 1 user, load average: 7.72, 6.43, 7.09 Tasks: 229 total, 10 running, 219 sleeping, 0 stopped, 0 zombie Cpu(s): 82.9%us, 6.3%sy, 0.0%ni, 10.4%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 24735700k total, 18697900k used, 6037800k free, 0k buffers Swap: 0k total, 0k used, 0k free, 10921476k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18526 sboisver 20 0 2681m 508m 6588 R 101.0 2.1 3:06.02 Ray
18527 sboisver 20 0 2681m 508m 6576 R 101.0 2.1 3:06.19 Ray
18528 sboisver 20 0 2681m 508m 6584 R 101.0 2.1 3:06.19 Ray

sebhtml commented 10 years ago

$ ls /home/sboisver12/memory-issue/ meminfo status top vmdata

sebhtml commented 10 years ago

With "--mca btl ^openib", the memory goes down from 2 GiB to 32 MiB.

/rap/nne-790-ab/projects/seb/github/208

Example:

mpiexec -n 64 \ --mca btl ^openib \ -output-filename Sample_P3J7-17 \ Ray \ -o Sample_P3J7-17 \ -k 31 \ -enable-neighbourhoods \ -detect-sequence-files Sample_P3J7 \

sebhtml commented 10 years ago

last year (2012-04-18), it worked:

/rap/nne-790-ab/projects/Meta-simulation/Sample-1000.dir

$ grep "memory usage" job-1000-bacteria-1024.3/job-1000-bacteria-1024.3.1.0001|head -n1 Rank 1: assembler memory usage: 128936 KiB

$ ls -l job-1000-bacteria-1024.3/job-1000-bacteria-1024.3.1.0001 -rw-r----- 1 sboisver12 nne-790-01 1136912 Apr 18 2012 job-1000-bacteria-1024.3/job-1000-bacteria-1024.3.1.0001

sebhtml commented 10 years ago

Also, in Septembre 2012 (2012-09-05):

$ ls -l ./x1000-2012-08-21-QOS/logs/x1000-2012-08-21-QOS.1.000 -rw-r----- 1 sboisver12 nne-790-01 14527622 Sep 5 2012 ./x1000-2012-08-21-QOS/logs/x1000-2012-08-21-QOS.1.000 $ grep "memory usage" ./x1000-2012-08-21-QOS/logs/x1000-2012-08-21-QOS.1.000 | head -n1 Rank 0: assembler memory usage: 92304 KiB

Versions:

$ module load compilers/gcc/4.7.2 $ module load mpi/openmpi/1.6.3_gcc $ export PATH=/rap/nne-790-ab/software/RayAppBuilds/last-build/:$PATH $ $ Ray -version Ray version 2.2.0-rc0

sebhtml commented 10 years ago

Now it is buggy with the same old versions:

$ grep "memory usage" Sample_P3J7-old-1.1.00|head -n1 Rank 0: assembler memory usage: 2084424 KiB

module load compilers/gcc/4.7.2 module load mpi/openmpi/1.6.3_gcc export PATH=/rap/nne-790-ab/software/RayAppBuilds/last-build/:$PATH

sebhtml commented 10 years ago

An email should be sent to colosse@calculquebec.ca to tell this story.

I will just be using --mca btl ^openib to work around that stuff.

sebhtml commented 10 years ago

this is not a problem with Ray. closing it.

sebhtml commented 10 years ago

Userland fix:

$ cat /home/sboisver12/.openmpi/mca-params.conf btl = ^openib

$ cat Sample_P3J7-19.s cat: Sample_P3J7-19.s: No such file or directory $ cat Sample_P3J7-19.sh

PBS -S /bin/bash

PBS -N Sample_P3J7-19

PBS -o Sample_P3J7-19.stdout

PBS -e Sample_P3J7-19.stderr

PBS -A nne-790-ac

PBS -l walltime=02:00:00:00

PBS -l nodes=32:ppn=8

cd $PBS_O_WORKDIR

module use /rap/nne-790-ab/modulefiles module load nne-790-ab/ray/857b773c98e1aa4e9aa86333ab786e52f443af4c-1

mpiexec -n 256 \ Ray \ -route-messages -o Sample_P3J7-19 \ -k 31 \ -enable-neighbourhoods \ -detect-sequence-files Sample_P3J7 \

sebhtml commented 10 years ago

tests are here: /rap/nne-790-ab/projects/seb/github/208