sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

test on assemblathon 2 bird with -k 61 #174

Closed sebhtml closed 11 years ago

sebhtml commented 11 years ago

job 10261804

/home/sboisver12/git-clones/Ray-TestSuite/system-tests/Large-Datasets

sebhtml commented 11 years ago

test also with -route-messages inside the configuration file.

sebhtml commented 11 years ago

test first with -k 31

sebhtml commented 11 years ago

for -k 31

Beginning of computation: 3 seconds Network testing: 2 minutes, 6 seconds File partitioning: 4 minutes, 32 seconds Sequence loading: 2 hours, 47 minutes, 40 seconds K-mer counting: 56 minutes, 44 seconds Coverage distribution analysis: 9 seconds Graph construction: 2 hours, 31 minutes, 29 seconds Edge purge: 50 minutes, 33 seconds Selection of optimal read markers: 1 hours, 42 minutes, 36 seconds Detection of assembly seeds: 29 minutes, 44 seconds Estimation of outer distances for paired reads: 19 minutes, 40 seconds Bidirectional extension of seeds: 13 hours, 54 minutes, 2 seconds Merging of redundant contigs: 8 hours, 19 minutes, 2 seconds Generation of contigs: 1 minutes, 3 seconds Scaffolding of contigs: 3 hours, 22 minutes, 51 seconds Total: 1 days, 11 hours, 22 minutes, 18 seconds

BGI_illumina_data + illumina_uk_qseq

sebhtml commented 11 years ago

With Illumina data at k=31, it got stuck at

Rank 329 requires 4213508 bytes for storage. Rank 81 requires 4296342 bytes for storage. Rank 301 requires 3819954 bytes for storage. Rank 397 requires 3935799 bytes for storage. Rank 206 requires 4269500 bytes for storage. Rank 73 requires 4391855 bytes for storage. Rank 365 requires 3992411 bytes for storage. Rank 429 requires 4180933 bytes for storage. Rank 109 requires 4229076 bytes for storage. Rank 213 requires 3975706 bytes for storage. Rank 334 requires 4650353 bytes for storage. Rank 205 requires 4503802 bytes for storage. Rank 237 requires 4540413 bytes for storage. Rank 85 requires 4332618 bytes for storage. Rank 201 requires 5113335 bytes for storage. Rank 333 requires 5052987 bytes for storage.

Bird-11 is -k 57

sebhtml commented 11 years ago

once the seed filtering works, rebuild with k64-no-mpiio.sh

sebhtml commented 11 years ago

with (old version, 1.7-+++) with -k 31

/rap/nne-790-ab/projects/Ray-Bird-Assemblathon-2/./k31-Ray-Bird-2011-11-28-debruijn-512-8-3/ElapsedTime.txt

Beginning of computation: 3 seconds Network testing: 2 minutes, 6 seconds File partitioning: 4 minutes, 32 seconds Sequence loading: 2 hours, 47 minutes, 40 seconds K-mer counting: 56 minutes, 44 seconds Coverage distribution analysis: 9 seconds Graph construction: 2 hours, 31 minutes, 29 seconds Edge purge: 50 minutes, 33 seconds Selection of optimal read markers: 1 hours, 42 minutes, 36 seconds Detection of assembly seeds: 29 minutes, 44 seconds Estimation of outer distances for paired reads: 19 minutes, 40 seconds Bidirectional extension of seeds: 13 hours, 54 minutes, 2 seconds Merging of redundant contigs: 8 hours, 19 minutes, 2 seconds Generation of contigs: 1 minutes, 3 seconds Scaffolding of contigs: 3 hours, 22 minutes, 51 seconds Total: 1 days, 11 hours, 22 minutes, 18 seconds (END)

with v2.2.0-rcx with -k 31


Step: Detection of assembly seeds Date: Fri Apr 12 23:27:00 2013 Elapsed time: 53 minutes, 5 seconds Since beginning: 6 hours, 14 minutes, 9 seconds


With v2.2.0-rc0 with -k 57:

It takes a very long time.

sebhtml commented 11 years ago

job 10268744 disables the new code that checks for bad children and parents.

/home/sboisver12/git-clones/Ray-TestSuite/system-tests/Large-Datasets

sebhtml commented 11 years ago

Running time with -minimum-contig-length 114 -k 57 :

WallTime: 1:16:48 of 5:00:00 Rank 461 is creating seeds [3500001/17081546]


eel at k 61

Network testing: 2 seconds Counting sequences to assemble: 9 minutes, 13 seconds Sequence loading: 1 hours, 11 minutes, 48 seconds K-mer counting: 23 minutes, 4 seconds Coverage distribution analysis: 16 seconds Graph construction: 32 minutes, 34 seconds Null edge purging: 7 minutes, 5 seconds Selection of optimal read markers: 43 minutes, 38 seconds Detection of assembly seeds: 36 minutes, 20 seconds Estimation of outer distances for paired reads: 17 minutes, 38 seconds Bidirectional extension of seeds: 12 hours, 41 minutes, 16 seconds Merging of redundant paths: 3 hours, 27 minutes, 49 seconds Generation of contigs: 1 hours, 13 minutes, 16 seconds Scaffolding of contigs: 13 hours, 19 minutes, 41 seconds Counting sequences to search: 0 seconds Graph coloring: 16 seconds Counting contig biological abundances: 6 minutes, 48 seconds Counting sequence biological abundances: 2 seconds Loading taxons: 14 seconds Loading tree: 17 seconds Processing gene ontologies: 30 seconds Computing neighbourhoods: 1 seconds Total: 1 days, 10 hours, 52 minutes, 4 seconds


eel-Ray-polytope-512-k61-2013-04-16-1.stdout with

ray b7951aaf94ef5b05a1ed2ee156a175052ea21612 RayPlatform cda5eb537966098ec1ce31b6942a70a13a613d45

Running time for seed computation: Elapsed time: 41 minutes, 3 seconds

sebhtml commented 11 years ago

Work