Closed hwalinga closed 5 years ago
Can you share this data? It assembled to one contig but no unitigs....which is a little bit odd.
I think that if you modify ./5-consensus/utgcns.files
to include the full path to an empty file it will be happy. Something like:
cd 5-consensus
touch empty-file
echo $PWD/empty-file > utgcns.files
You can test this easily without restarting canu, by running the tgStoreLoad command by hand. Once it runs, check the S1.utgStore
directory for presence of *002*
files.
If Canu has other issues finishing, you can manually extract the contig with
tgStoreDump \
-S S1.seqStore \
-T unitigging/S1.ctgStore 2 \
-consensus -contigs -fasta > S1.contigs.fasta
(and, actually, if you only care about the contig sequence, I think that last command will work right now)
If I add that empty-file
to the utgcns.files
I get the following error:
ABORT: unknown consensus job name '/linuxhome/tmp/hielke/canu/S1/unitigging/5-consensus/empty-file'
If I run the tgStoreDump
manually I just get an empty file. I mean it could also be canu
cannot make any assembly out of the data, but at least I would expect it to tell me that in that case.
I will ask my supervisor if we can share the data. I am afraid the answer is no, but we shall see.
What files exist in unitigging/S1.ctgStore?
What do
tgStoreDump -S S1.seqStore -T unitigging/S1.ctgStore 1 -tigs
tgStoreDump -S S1.seqStore -T unitigging/S1.ctgStore 2 -tigs
report? (they should both report a few lines of tabular data)
tgStoreDump -S S1.seqStore -T unitigging/S1.ctgStore 1 -tigs
Report 8305 lines like (https://termbin.com/d6t7):
8324 24118 layout 1.00 1.00 unassm no no 1
tgStoreDump -S S1.seqStore -T unitigging/S1.ctgStore 2 -tigs
Report (8305) lines like (https://termbin.com/4uqy):
28 14763 ungapped 1.00 1.00 unassm no no 1
Ah! That's also a big clue why there are no unitigs.
To get the sequences, remove '-contig' from tgStoreDump. Canu thinks these are all 'unassembled' crud; it's tuned for larger "genomes".
Browse through the FAQ (https://canu.readthedocs.io/en/latest/faq.html) for some advice, in particular, the 'contigFilter'.
You might also want to experiment with down sampling your reads. https://canu.readthedocs.io/en/latest/parameter-reference.html#readsamplingcoverage
Any updates, did you get an assembly?
Hello @skoren
When using the following (so without `-contigs) we indeed were getting some assembled contigs.
tgStoreDump \
-S S1.seqStore \
-T unitigging/S1.ctgStore 2 \
-consensus -fasta > S1.contigs.fasta
However, we weren't so happy after all with the assemblies from canu, and attempted to use flye instead. Which had a lot less problems with assembly, however, I have the idea canu produces more output than flye.
Our problem is that we are trying to assemble chimeric viruses. So there are multiple different species in our samples, but they share a lot of common sequences. We hope to assemble the different species and than see what differences they have (what regions are chimeric). So far, it seems that this chimeric feature of our samples only confuse the assemblers.
If we compare canu and flye, we see that canu produces a lot of small contigs with low coverage, and some big ones, and flye only produces the large contigs.
The reason you have more output is this dump, without the contigs flag, is also outputting single reads which will have low coverage and not be considered "assembled" normally. You can filter those out from the fasta file manually if you want.
Given the information about your sample, this is really a metagenome and the worst case of one in fact since you have lots of very closely related strains with large SVs in a few places. First, the canu command you're using for that is not ideal, you want to use the metagenomic parameters from the FAQ and not the fast option you're doing now to get more accurate overlaps which may help separate some of the strains more. You're going to likely have to look at the assembly graphs to try to resolve some of these species (the unitigs.gfa in canu).
Idle
Hello, I have an error, and not sure how to prevent it, or if this is an issue with the software itself.
Using canu version 1.8 on Ubuntu 16.04.6 LTS (no virtual machine).
Running command:
I get the following error:
ERROR: no input tig files supplied on command line or via -L option.
Full error output: https://termbin.com/eg7h
I checked the error log thrown and the problem is that with the option
-L ./5-consensus/utgcns.files
the file does exist but is just empty. The usage description of this command (tgStoreLoad
) states that the program should succeed even if the file provided with the -L option is an empty file. So, I am not sure how problematic it is that theutgcns.files
file is empty and that I need to find a way to fix that, or that thetgStoreLoad
command should just run anyway, even if that./5-consensus/utgcns.files
file is empty.