sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Ray stops during neighborhoods finding #208

Closed fredericraymond closed 10 years ago

fredericraymond commented 10 years ago

Latest run:

/rap/nne-790-ab/projects/Project_CQDM2/Participant3-indepth/Sample_P3J7-SilverRayFR-2013-10-02d

Reads are in :

/rap/nne-790-ab/projects/Project_CQDM2/Participant3-indepth/Sample_P3J7

Ray command :

mpiexec -n 128 Ray \ -o \ Assembly \ -k \ 31 \ -enable-neighbourhoods \ -amos \ -write-kmers \ -p \ Sample/P3J7_Lane3_R1_10.fastq \ Sample/P3J7_Lane3_R2_10.fastq \ -p \ Sample/P3J7_Lane3_R1_11.fastq \ Sample/P3J7_Lane3_R2_11.fastq \ -p \ Sample/P3J7_Lane3_R1_12.fastq \ Sample/P3J7_Lane3_R2_12.fastq \ -p \ Sample/P3J7_Lane3_R1_13.fastq \ Sample/P3J7_Lane3_R2_13.fastq \ -p \ Sample/P3J7_Lane3_R1_14.fastq \ Sample/P3J7_Lane3_R2_14.fastq \ -p \ Sample/P3J7_Lane3_R1_15.fastq \ Sample/P3J7_Lane3_R2_15.fastq \ -p \ Sample/P3J7_Lane3_R1_16.fastq \ Sample/P3J7_Lane3_R2_16.fastq \ -p \ Sample/P3J7_Lane3_R1_17.fastq \ Sample/P3J7_Lane3_R2_17.fastq \ -p \ Sample/P3J7_Lane3_R1_18.fastq \ Sample/P3J7_Lane3_R2_18.fastq \ -p \ Sample/P3J7_Lane3_R1_19.fastq \ Sample/P3J7_Lane3_R2_19.fastq \ -p \ Sample/P3J7_Lane3_R1_1.fastq \ Sample/P3J7_Lane3_R2_1.fastq \ -p \ Sample/P3J7_Lane3_R1_20.fastq \ Sample/P3J7_Lane3_R2_20.fastq \ -p \ Sample/P3J7_Lane3_R1_2.fastq \ Sample/P3J7_Lane3_R2_2.fastq \ -p \ Sample/P3J7_Lane3_R1_3.fastq \ Sample/P3J7_Lane3_R2_3.fastq \ -p \ Sample/P3J7_Lane3_R1_4.fastq \ Sample/P3J7_Lane3_R2_4.fastq \ -p \ Sample/P3J7_Lane3_R1_5.fastq \ Sample/P3J7_Lane3_R2_5.fastq \ -p \ Sample/P3J7_Lane3_R1_6.fastq \ Sample/P3J7_Lane3_R2_6.fastq \ -p \ Sample/P3J7_Lane3_R1_7.fastq \ Sample/P3J7_Lane3_R2_7.fastq \ -p \ Sample/P3J7_Lane3_R1_8.fastq \ Sample/P3J7_Lane3_R2_8.fastq \ -p \ Sample/P3J7_Lane3_R1_9.fastq \ Sample/P3J7_Lane3_R2_9.fastq \ -search \ /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/ARDB \ -search \ /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/Bacteria-Genomes \ -search \ /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/HumanChromosomes \ -search \ /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/NCBI-Bacteria_DRAFT \ -search \ /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/Viruses-Genomes \ -search \ /scratch/nne-790-ac/CQDM-Production/Search-Datasets/MERGEM_2013-04-29/RG-genes \ -search \ /scratch/nne-790-ac/CQDM-Production/Search-Datasets/MERGEM_2013-04-29/IS-genes \ -search \ /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/VFDB_13-01-24 \ -with-taxonomy \ /rap/nne-790-ab/genomes/taxonomy/last-build/Genome-to-Taxon.tsv \ /rap/nne-790-ab/genomes/taxonomy/last-build/TreeOfLife-Edges.tsv \ /rap/nne-790-ab/genomes/taxonomy/last-build/Taxon-Names.tsv

sebhtml commented 10 years ago

$ module use /rap/nne-790-ab/modulefiles/ $ module load nne-790-ab/ray/462f5cbc37b0a27a2e4a902ed5bfbe6fc945554a-1 $ which Ray /rap/nne-790-ab/software/seb/apps/ray/462f5cbc37b0a27a2e4a902ed5bfbe6fc945554a-1/bin/Ray

sebhtml commented 10 years ago
$ cat start.sh 
#PBS -S /bin/bash
#PBS -N Sample_P3J7-1
#PBS -o Sample_P3J7-1.stdout
#PBS -e Sample_P3J7-1.stderr
#PBS -A nne-790-ac
#PBS -l walltime=02:00:00:00
#PBS -l nodes=8:ppn=8
#PBS -M sebastien.boisvert@calculquebec.ca
#PBS -m bea
cd $PBS_O_WORKDIR

module use /rap/nne-790-ab/modulefiles/
module load nne-790-ab/ray/462f5cbc37b0a27a2e4a902ed5bfbe6fc945554a-2

#-write-kmers \
#-amos \

mpiexec -n 128 Ray \
-o \
Sample_P3J7-1 \
-k \
31 \
-enable-neighbourhoods \
-detect-sequence-files Sample_P3J7 \
-search \
/rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/ARDB \
-search \
/rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/Bacteria-Genomes \
-search \
/rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/HumanChromosomes \
-search \
/rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/NCBI-Bacteria_DRAFT \
-search \
/rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/Viruses-Genomes \
-search \
/scratch/nne-790-ac/CQDM-Production/Search-Datasets/MERGEM_2013-04-29/RG-genes \
-search \
/scratch/nne-790-ac/CQDM-Production/Search-Datasets/MERGEM_2013-04-29/IS-genes \
-search \
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/VFDB_13-01-24 \
-with-taxonomy \
/rap/nne-790-ab/genomes/taxonomy/last-build/Genome-to-Taxon.tsv \
/rap/nne-790-ab/genomes/taxonomy/last-build/TreeOfLife-Edges.tsv \
/rap/nne-790-ab/genomes/taxonomy/last-build/Taxon-Names.tsv
sebhtml commented 10 years ago

/rap/nne-790-ab/projects/seb/github/208

sebhtml commented 10 years ago

This works for me, but I needed to add btl = ^openib because openib is buggy on colosse (see #209; this is likely due to a driver upgrade or whatever, not due to Open-MPI or Ray).

Config:

$ cat ~/.openmpi/mca-params.conf
btl = ^openib

Job script:

$ cat Sample_P3J7-20.sh
#PBS -S /bin/bash
#PBS -N Sample_P3J7-20
#PBS -o Sample_P3J7-20.stdout
#PBS -e Sample_P3J7-20.stderr
#PBS -A nne-790-ac
#PBS -l walltime=02:00:00:00
#PBS -l nodes=32:ppn=8
cd $PBS_O_WORKDIR

module use /rap/nne-790-ab/modulefiles
module load nne-790-ab/ray/857b773c98e1aa4e9aa86333ab786e52f443af4c-1

mpiexec -n 256 \
Ray \
-route-messages \
-o Sample_P3J7-20 \
-k 31 \
-enable-neighbourhoods \
-detect-sequence-files Sample_P3J7 \

neighbourhoods:

$ ls -lh Sample_P3J7-20/NeighbourhoodRelations.txt 
-rw-r--r-- 1 sboisver12 nne-790-01 783M Oct 14 05:46 Sample_P3J7-20/NeighbourhoodRelations.txt

Running time:

$ head -n1 Sample_P3J7-20/ElapsedTime.txt
#Step   Date    Elapsed time    Since Beginning
$ tail Sample_P3J7-20/ElapsedTime.txt  -n1
Computing neighbourhoods        2013-10-14T05:46:46     16 hours, 53 minutes, 52 seconds        19 hours, 28 minutes, 30 seconds

Inputs: 151 M sequences:

$ tail -n 3 Sample_P3J7-20/NumberOfSequences.txt 
        NumberOfSequences: 152011768
        FirstSequence: 0
        LastSequence: 152011767

Obviously, 16 hours is a lot for neighbourhoods, but again, hey, this code is disabled by default because it gets stuck in DNA repeats. I am not even sure that it could be implemented more efficiently. Maybe by using a maximum search depth AND a maximum number of visited vertices.

I am not closing this ticket as there is room for improvement.

But it works, it is just combinatorial and lengthy.