rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
566 stars 131 forks source link

Want to do a hybrid assembly but output is only best_spades.gfa and overlaps_removed.gfa #182

Open erthrall opened 5 years ago

erthrall commented 5 years ago

Thank you so much for creating this amazing tool! The only question I have is that the hybrid assembly worked beautifully for all of my isolates except for one. I have inputted the short reads and long reads correctly:

unicycler -1 isolate4.fwd_val_1.fq.gz -2 isolate4.rev_val_2.fq.gz -l isolate4.minion.fq.gz -o uni_ass_4 --threads 2

unicycler acknowledges that it needs to carry out a hybrid assembly but only performs a simple SPAdes assembly. Please take a look at the below log file.


Starting Unicycler (2019-04-26 12:23:02)

Welcome to Unicycler, an assembly pipeline for bacterial genomes. Since you provided both short and long reads, Unicycler will perform a hybrid assembly. It will first use SPAdes to make a short-read assembly graph, and then it will use various methods to scaffold that graph with the long reads.
For more information, please see https://github.com/rrwick/Unicycler

Command: /home/ubuntu/miniconda3/envs/bio3092/bin/unicycler -1 isolate4.fwd_val_1.fq.gz -2 isolate4.rev_val_2.fq.gz -l isolate4.minion.fq.gz -o uni_ass_4 --threads 2

Unicycler version: v0.4.7 Using 2 threads

Making output directory: /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4

Dependencies: Program Version Status
spades.py 3.13.0 good
racon 1.3.2 good
makeblastdb 2.5.0+ good
tblastn 2.5.0+ good
bowtie2-build 2.3.4.3 good
bowtie2 2.3.4.3 good
samtools 1.9 good
java 1.8.0_152-release good
pilon 1.23 good
bcftools not used

SPAdes read error correction (2019-04-26 12:23:45)

Unicycler uses the SPAdes read error correction module to reduce the number of errors in the short read before SPAdes assembly. This can make the assembly faster and simplify the assembly graph structure.

Command: /home/ubuntu/miniconda3/envs/bio3092/bin/spades.py -1 /home/ubuntu/resources/coursework/CW2/data/isolate4/isolate4.fwd_val_1.fq.gz -2 /home/ubuntu/resources/coursework/CW2/data/isolate4/isolate4.rev_val_2.fq.gz -o /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/read_correction --threads 2 --only-error-correction

Corrected reads: /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/corrected_1.fastq.gz /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/corrected_2.fastq.gz

Choosing k-mer range for assembly (2019-04-26 12:37:29)

Unicycler chooses a k-mer range for SPAdes based on the length of the input reads. It uses a wide range of many k-mer sizes to maximise the chance of finding an ideal assembly.

SPAdes maximum k-mer: 127 Median read length: 194 K-mer range: 27, 47, 63, 77, 89, 99, 107, 115, 121, 127

SPAdes assemblies (2019-04-26 12:37:39)

Unicycler now uses SPAdes to assemble the short reads. It scores the assembly graph for each k-mer using the number of contigs (fewer is better) and the number of dead ends (fewer is better). The score function is 1/(c*(d+2)), where c is the contig count and d is the dead end count.

K-mer Contigs Dead ends Score
27 1,541 0 3.24e-04 47 700 0 7.14e-04 63 478 0 1.05e-03 77 410 0 1.22e-03 89 317 0 1.58e-03 99 248 0 2.02e-03 107 225 0 2.22e-03 115 212 0 2.36e-03 121 212 0 2.36e-03 127 198 0 2.53e-03 <-best

Deleting /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/

Determining graph multiplicity (2019-04-26 12:47:33)

Multiplicity is the number of times a sequence occurs in the underlying sequence. Single-copy contigs (those with a multiplicity of one, occurring only once in the underlying sequence) are particularly useful.

Saving /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/001_best_spades_graph.gfa

Cleaning graph (2019-04-26 12:47:33)

Unicycler now performs various cleaning procedures on the graph to remove overlaps and simplify the graph structure. The end result is a graph ready for bridging.

Graph overlaps removed

Removed zero-length segments: 114, 115, 116, 118, 120, 122, 125, 130, 133, 146, 147, 149, 150, 152, 153, 155, 158, 160, 161, 162, 164, 166, 178, 182, 185, 194

Removed zero-length segments: 117, 174, 176, 190

Removed zero-length segments: 187

Merged small segments: 188, 189, 191, 192, 193, 195, 196

Saving /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/002_overlaps_removed.gfa

Unicycler now selects a set of anchor contigs from the single-copy contigs. These are the contigs which will be connected via bridges to form the final assembly.

41 anchor segments (5,190,363 bp) out of 160 total segments (5,251,008 bp)

Creating SPAdes contig bridges (2019-04-26 12:47:33)

SPAdes uses paired-end information to perform repeat resolution (RR) and produce contigs from the assembly graph. SPAdes saves the graph paths corresponding to these contigs in the contigs.paths file. When one of these paths contains two or more anchor contigs, Unicycler can create a bridge from the path.

                                                                     Bridge

Start Path End quality -29 -65 -> -44 -> 66 -> -50 -> -106 42 11.8 -14 -108 23 60.0 -8 115 -> -73 -> 90 9 41.6 -4 108 32 60.9 5 80 -> -57 -> 63 41 22.6 8 -83 30 60.3 11 -110 -38 62.0 18 83 -30 59.4 20 -90 -> -72 -> -115 28 42.7 21 -107 -> 96 -> -107 -> 96 -> -144 -> 117 -> 96 -> -144 33 5.0 29 -112 36 61.8 33 110 38 63.2 41 80 -> -51 -> 106 -42 19.8

Creating loop unrolling bridges (2019-04-26 12:47:33)

When a SPAdes contig path connects an anchor contig with the middle contig of a simple loop, Unicycler concludes that the sequences are contiguous (i.e. the loop is not a separate piece of DNA). It then uses the read depth of the middle and repeat contigs to guess the number of times to traverse the loop and makes a bridge.

                              Loop count   Loop count    Loop    Bridge

Start Repeat Middle End by repeat by middle count quality 23 -68 148 -32 6.61 7.01 7 2.5 8 -83 30 -18 0.74 1.12 1 39.6 -39 102 97 -28 0.47 1.11 1 36.3 33 110 38 -11 0.60 1.07 1 41.0

Loading reads (2019-04-26 12:47:33)

stevenjdunn commented 5 years ago

Does it hang on 100% alignment? Might be related to #140