Closed Narmatha99 closed 3 months ago
It looks like markdown hid the \<OS> text in your issue and the documents. The <OS>
is either Darwin or Linux, depending on what machine you're running on, both are x86. So if you have linux you'd want:
curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.Linux-amd64.tar.xz --output canu-2.2.Linux.tar.xz
tar -xJf canu-2.2.*.tar.xz
Thank you for the explanation. I am however working on a Mac Pro, does that mean that Canu is not compatible with MacOS?
OSX is Darwin so you should download the Darwin tarball. Since it's compiled for x86 architectures if you have apple silicon you'll need to have rosetta (https://support.apple.com/en-us/102527) which it should automatically install when you run canu. You'll also need the other requirements for running canu listed on the release notes page:
Perl 5.12.0+, or File::Path 2.08
Java SE 8+
I confirmed I was able to run a small assembly on my MacBook Air w/apple M2.
I was able to download Canu, thank you very much! But I have come to the next problem (which is my estimation mistake I think). I used the following command: ~/Downloads/canu-2.2/bin/canu -d barcode1 -p pseaer genomeSize=5.5m -nanopore ~/Downloads/QIA_test_fastq/Long_read_barcode_1.fastq.gz
And got this error:
So if I understand correctly the problem is my genome size in this? How should I estimate the genome size? I know already what bacteria it is (pseudomonas aeruginosa) and I googled the haploid genome size of what I probably assembled (5.5 - 7 Mbp, I tried it with 5.5 Mbp).
The issue is you don't have enough input to assemble the genome. The genome size should be approximately what you expect the assembly to be. What is the input set you're using, how many reads/bases?
Ah okay, so my input is a library of 8 samples pooled together but each one with a barcode (ONT + rapid barcoding kit). I did basecalling through the terminal with Dorado so now I have bam and fastq files. I am not sure how I would check the number of reads/bases because I did basecalling outside the usual nanopore software.
canu would put that info in the report file, you can also check the info in the pseaer.seqStore/info.txt. Post that file here.
So I tried it again with a Citrobacter Freundii but again I get the error message. I have added the info.txt underneath.
Reads Bases Read Type
0 - total-reads
0 0 raw
0 0 raw-trimmed
0 0 raw-compressed
0 0 raw-compressed-trimmed
0 0 corrected
0 0 corrected-trimmed
0 0 corrected-compressed
0 0 corrected-compressed-trimmed
And the canu_asm.seqStore.sh:
According to the log there are no reads in the input files, are all the reads shorter than 1kb? Can you post the canu_asm.seqStore.err
file as well as one of the fastq.gz sets (see https://canu.readthedocs.io/en/latest/faq.html#how-can-i-send-data-to-you if it's too big for GitHub).
I send them through FTP. Can you let me know if you are able to download both files?
Unfortunately I don't see the files on the FTP site, are you able to send them another way?
Idle, initial issue resolved. Secondary issue is due to invalid input, a non-gz file being provided as gz. The seqStore error reports:
gzip: Long_reads_barcode_20.fastq.gz: not in gzip format
Loaded 0 0.0% 0 0.0% Long_reads_barcode_20.fastq.gz
All reads processed.
Looking at the upload, it seems the file is not gzipped, it's just a regular fastq so removing the extension would fix the problem. That said, the data is very small given the genome size of 4-5mb, it's only about 1x which isn't enough.
I am new to this method of genome assembly, so I am struggling a little with installing Canu. I think I should use the following command (but I am not sure):
curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.<OX>-amd64.tar.xz --output canu-2.2.<OS>.tar.xz
If this is correct, what should I define
<OX>
and<OS>
as?Thank you.