Thank you so much for creating this amazing tool! The only question I have is that the hybrid assembly worked beautifully for all of my isolates except for one. I have inputted the short reads and long reads correctly:
unicycler acknowledges that it needs to carry out a hybrid assembly but only performs a simple SPAdes assembly. Please take a look at the below log file.
Starting Unicycler (2019-04-26 12:23:02)
Welcome to Unicycler, an assembly pipeline for bacterial genomes. Since you provided both short and long reads, Unicycler will perform a hybrid assembly. It will first use SPAdes to make a short-read assembly graph, and then it will use various methods to scaffold that graph with the long reads.
For more information, please see https://github.com/rrwick/Unicycler
Making output directory:
/home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4
Dependencies:
Program Version Status
spades.py 3.13.0 good
racon 1.3.2 good
makeblastdb 2.5.0+ good
tblastn 2.5.0+ good
bowtie2-build 2.3.4.3 good
bowtie2 2.3.4.3 good
samtools 1.9 good
java 1.8.0_152-release good
pilon 1.23 good
bcftools not used
Unicycler uses the SPAdes read error correction module to reduce the number of errors in the short read before SPAdes assembly. This can make the assembly faster and simplify the assembly graph structure.
Choosing k-mer range for assembly (2019-04-26 12:37:29)
Unicycler chooses a k-mer range for SPAdes based on the length of the input reads. It uses a wide range of many k-mer sizes to maximise the chance of finding an ideal assembly.
SPAdes maximum k-mer: 127
Median read length: 194
K-mer range: 27, 47, 63, 77, 89, 99, 107, 115, 121, 127
SPAdes assemblies (2019-04-26 12:37:39)
Unicycler now uses SPAdes to assemble the short reads. It scores the assembly graph for each k-mer using the number of contigs (fewer is better) and the number of dead ends (fewer is better). The score function is 1/(c*(d+2)), where c is the contig count and d is the dead end count.
Multiplicity is the number of times a sequence occurs in the underlying sequence. Single-copy contigs (those with a multiplicity of one, occurring only once in the underlying sequence) are particularly useful.
Unicycler now performs various cleaning procedures on the graph to remove overlaps and simplify the graph structure. The end result is a graph ready for bridging.
Unicycler now selects a set of anchor contigs from the single-copy contigs. These are the contigs which will be connected via bridges to form the final assembly.
41 anchor segments (5,190,363 bp) out of 160 total segments (5,251,008 bp)
SPAdes uses paired-end information to perform repeat resolution (RR) and produce contigs from the assembly graph. SPAdes saves the graph paths corresponding to these contigs in the contigs.paths file. When one of these paths contains two or more anchor contigs, Unicycler can create a bridge from the path.
Bridge
When a SPAdes contig path connects an anchor contig with the middle contig of a simple loop, Unicycler concludes that the sequences are contiguous (i.e. the loop is not a separate piece of DNA). It then uses the read depth of the middle and repeat contigs to guess the number of times to traverse the loop and makes a bridge.
Loop count Loop count Loop Bridge
Thank you so much for creating this amazing tool! The only question I have is that the hybrid assembly worked beautifully for all of my isolates except for one. I have inputted the short reads and long reads correctly:
unicycler -1 isolate4.fwd_val_1.fq.gz -2 isolate4.rev_val_2.fq.gz -l isolate4.minion.fq.gz -o uni_ass_4 --threads 2
unicycler acknowledges that it needs to carry out a hybrid assembly but only performs a simple SPAdes assembly. Please take a look at the below log file.
Starting Unicycler (2019-04-26 12:23:02)
Command: /home/ubuntu/miniconda3/envs/bio3092/bin/unicycler -1 isolate4.fwd_val_1.fq.gz -2 isolate4.rev_val_2.fq.gz -l isolate4.minion.fq.gz -o uni_ass_4 --threads 2
Unicycler version: v0.4.7 Using 2 threads
Making output directory: /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4
Dependencies: Program Version Status
spades.py 3.13.0 good
racon 1.3.2 good
makeblastdb 2.5.0+ good
tblastn 2.5.0+ good
bowtie2-build 2.3.4.3 good
bowtie2 2.3.4.3 good
samtools 1.9 good
java 1.8.0_152-release good
pilon 1.23 good
bcftools not used
SPAdes read error correction (2019-04-26 12:23:45)
Command: /home/ubuntu/miniconda3/envs/bio3092/bin/spades.py -1 /home/ubuntu/resources/coursework/CW2/data/isolate4/isolate4.fwd_val_1.fq.gz -2 /home/ubuntu/resources/coursework/CW2/data/isolate4/isolate4.rev_val_2.fq.gz -o /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/read_correction --threads 2 --only-error-correction
Corrected reads: /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/corrected_1.fastq.gz /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/corrected_2.fastq.gz
Choosing k-mer range for assembly (2019-04-26 12:37:29)
SPAdes maximum k-mer: 127 Median read length: 194 K-mer range: 27, 47, 63, 77, 89, 99, 107, 115, 121, 127
SPAdes assemblies (2019-04-26 12:37:39)
K-mer Contigs Dead ends Score
27 1,541 0 3.24e-04 47 700 0 7.14e-04 63 478 0 1.05e-03 77 410 0 1.22e-03 89 317 0 1.58e-03 99 248 0 2.02e-03 107 225 0 2.22e-03 115 212 0 2.36e-03 121 212 0 2.36e-03 127 198 0 2.53e-03 <-best
Deleting /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/spades_assembly/
Determining graph multiplicity (2019-04-26 12:47:33)
Saving /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/001_best_spades_graph.gfa
Cleaning graph (2019-04-26 12:47:33)
Graph overlaps removed
Removed zero-length segments: 114, 115, 116, 118, 120, 122, 125, 130, 133, 146, 147, 149, 150, 152, 153, 155, 158, 160, 161, 162, 164, 166, 178, 182, 185, 194
Removed zero-length segments: 117, 174, 176, 190
Removed zero-length segments: 187
Merged small segments: 188, 189, 191, 192, 193, 195, 196
Saving /home/ubuntu/resources/coursework/CW2/data/isolate4/uni_ass_4/002_overlaps_removed.gfa
41 anchor segments (5,190,363 bp) out of 160 total segments (5,251,008 bp)
Creating SPAdes contig bridges (2019-04-26 12:47:33)
Start Path End quality -29 -65 -> -44 -> 66 -> -50 -> -106 42 11.8 -14 -108 23 60.0 -8 115 -> -73 -> 90 9 41.6 -4 108 32 60.9 5 80 -> -57 -> 63 41 22.6 8 -83 30 60.3 11 -110 -38 62.0 18 83 -30 59.4 20 -90 -> -72 -> -115 28 42.7 21 -107 -> 96 -> -107 -> 96 -> -144 -> 117 -> 96 -> -144 33 5.0 29 -112 36 61.8 33 110 38 63.2 41 80 -> -51 -> 106 -42 19.8
Creating loop unrolling bridges (2019-04-26 12:47:33)
Start Repeat Middle End by repeat by middle count quality 23 -68 148 -32 6.61 7.01 7 2.5 8 -83 30 -18 0.74 1.12 1 39.6 -39 102 97 -28 0.47 1.11 1 36.3 33 110 38 -11 0.60 1.07 1 41.0
Loading reads (2019-04-26 12:47:33)