sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

The program seems stuck #248

Open lfaller opened 6 years ago

lfaller commented 6 years ago

Sometimes, Ray Meta seems stuck. The log output looks as follows:

...
Rank 7 computing contig abundances [63043/130531] [118/118]
Rank 77 computing contig abundances [62815/130236] [1/118]
Rank 77 computing contig abundances [62815/130236] [118/118]
Rank 39 computing contig abundances [62018/130647] [1/118]
Rank 39 computing contig abundances [62018/130647] [118/118]
Rank 76 computing contig abundances [63690/130359] [1/118]
Rank 76 computing contig abundances [63690/130359] [118/118]
Rank 116 computing contig abundances [62513/130973] [1/118]
Rank 116 computing contig abundances [62513/130973] [118/118]
Rank 15 computing contig abundances [63127/130269]

However, the ElapsedTime.txt file shows that the step Computing Neighborhoods has finished days ago:

cat ElapsedTime.txt
#Step   Date    Elapsed time    Since Beginning
Network testing 2017-06-28T23:08:19 3 seconds   3 seconds
Counting sequences to assemble  2017-06-28T23:09:01 42 seconds  45 seconds
Sequence loading    2017-06-28T23:14:28 5 minutes, 27 seconds   6 minutes, 12 seconds
K-mer counting  2017-06-28T23:52:04 37 minutes, 36 seconds  43 minutes, 48 seconds
Coverage distribution analysis  2017-06-28T23:52:21 17 seconds  44 minutes, 5 seconds
Graph construction  2017-06-29T00:57:55 1 hours, 5 minutes, 34 seconds  1 hours, 49 minutes, 39 seconds
Null edge purging   2017-06-29T02:52:27 1 hours, 54 minutes, 32 seconds 3 hours, 44 minutes, 11 seconds
Selection of optimal read markers   2017-06-29T03:20:41 28 minutes, 14 seconds  4 hours, 12 minutes, 25 seconds
Detection of assembly seeds 2017-06-29T06:37:02 3 hours, 16 minutes, 21 seconds 7 hours, 28 minutes, 46 seconds
Estimation of outer distances for paired reads  2017-06-29T06:37:03 1 seconds   7 hours, 28 minutes, 47 seconds
Bidirectional extension of seeds    2017-06-29T13:55:06 7 hours, 18 minutes, 3 seconds  14 hours, 46 minutes, 50 seconds
Merging of redundant paths  2017-06-29T20:50:51 6 hours, 55 minutes, 45 seconds 21 hours, 42 minutes, 35 seconds
Generation of contigs   2017-06-29T21:12:32 21 minutes, 41 seconds  22 hours, 4 minutes, 16 seconds
Scaffolding of contigs  2017-06-30T00:02:04 2 hours, 49 minutes, 32 seconds 1 days, 53 minutes, 48 seconds
Counting sequences to search    2017-06-30T00:02:04 0 seconds   1 days, 53 minutes, 48 seconds
Graph coloring  2017-06-30T00:02:27 23 seconds  1 days, 54 minutes, 11 seconds
Counting contig biological abundances   2017-06-30T02:21:31 2 hours, 19 minutes, 4 seconds  1 days, 3 hours, 13 minutes, 15 seconds
Counting sequence biological abundances 2017-06-30T02:21:31 0 seconds   1 days, 3 hours, 13 minutes, 15 seconds
Loading taxons  2017-06-30T02:21:49 18 seconds  1 days, 3 hours, 13 minutes, 33 seconds
Loading tree    2017-06-30T02:22:13 24 seconds  1 days, 3 hours, 13 minutes, 57 seconds
Processing gene ontologies  2017-06-30T02:22:46 33 seconds  1 days, 3 hours, 14 minutes, 30 seconds
Computing neighbourhoods    2017-06-30T02:22:54 8 seconds   1 days, 3 hours, 14 minutes, 38 seconds

On top of that, the CPU is not busy.

Thanks for any suggestions!

majedoms commented 6 years ago

I'm running into the same problem. But not with the Meta, just the default Ray!

The program is stuck!

And the CPU is not busy neither.

Have you figured out what the issue was?

Here's the end of the report:


Step: K-mer counting Date: Sat Aug 5 16:49:23 2017 Elapsed time: 1 minutes, 12 seconds Since beginning: 1 minutes, 21 seconds


Rank 0 number of set bits in the Bloom filter: [ 17169986 / 68113920 ] (Rank 6 number of set bits in the Bloom filter: [ 17175856 / Rank 25.2077Rank 5Rank number of set bits in the Bloom filter: [ 17163771 / 68113920 ] (%)2Rank 68113920 ] ( number of set bits in the Bloom filter: [ 425.1986 number of set bits in the Bloom filter: 171765591 / number of set bits in the Bloom filter: 25.2164[ [ %)1716497917176003 %) / 68113920 ] (25.2166%) 68113920 ] (25.2174%) Rank 3 number of set bits in the Bloom filter: [ 17168303 / 68113920 ] (25.2053%) Rank 5 destroyed its Bloom filter Rank 5 has 1734618 k-mers (completed) [BloomFilter] Rank 5: k-mers sampled -> 4943682, k-mers dropped -> 3209064 (64.9124%), k-mers accepted -> 1734618 (35.0876%) / 68113920 ] (25.2004%) Rank 6 destroyed its Bloom filter Rank 6 has 1736930 k-mers (completed) [BloomFilter] Rank 6: k-mers sampled -> 4948090, k-mers dropped -> 3211160 (64.897%), k-mers accepted -> 1736930 (35.103%) Rank 2 destroyed its Bloom filter Rank 2 has 1737168 k-mers (completed) [BloomFilter] Rank 2: k-mers sampled -> 4947500, k-mers dropped -> 3210332 (64.888%), k-mers accepted -> 1737168 (35.112%)

Rank 5: assembler memory usage: 1213212 KiB Rank 3 destroyed its Bloom filter Rank 3 has 1734064 k-mers (completed) [BloomFilter] Rank 3: k-mers sampled -> 4946468, k-mers dropped -> 3212404 (Rank 64.9434%), k-mers accepted -> 1734064 (35.0566%) Rank 0 destroyed its Bloom filter Rank 0 has 1735134 k-mers (completed) [BloomFilter] Rank 0: k-mers sampled -> 4945364, k-mers dropped -> 3210230 (64.9139%), k-mers accepted -> 1735134 (35.0861%) Rank 6: assembler memory usage: 1204896 KiB Rank 3: assembler memory usage: 1196600 KiB 2: assembler memory usage: 1204896 KiB Rank 0: assembler memory usage: 1196612 KiB Rank 1 destroyed its Bloom filter Rank 1 has 1737320 k-mers (completed) [BloomFilter] Rank 1: k-mers sampled -> 4944760, k-mers dropped -> 3207440 (64.8654%), k-mers accepted -> 1737320 (35.1346%) Rank 1: assembler memory usage: 1196752 KiB Rank 4 destroyed its Bloom filter Rank 4 has 1735240 k-mers (completed) [BloomFilter] Rank 4: k-mers sampled -> 4947720, k-mers dropped -> 3212480 (64.9285%), k-mers accepted -> 1735240 (35.0715%) Rank 4: assembler memory usage: 1196804 KiB

Rank 0: the minimum coverage is 62 Rank 0: the peak coverage is 64


Step: Coverage distribution analysis Date: Sat Aug 5 16:49:30 2017 Elapsed time: 7 seconds Since beginning: 1 minutes, 28 seconds


lfaller commented 6 years ago

I ended up killing it but unfortunately I don't know what caused the problem :-(

I had some samples that were quickly assembled, and other samples that weren't (even though the number of fasta input sequences was comparable). I assume that some artifact about the sample sequences was different that made it hard for the assembler to make an assembly call? However, I am also dealing with metagenomics data which contains short sequence fragments from different microbial species -- not a trivial task.