marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

How to install canu #2324

Closed Narmatha99 closed 3 months ago

Narmatha99 commented 5 months ago

I am new to this method of genome assembly, so I am struggling a little with installing Canu. I think I should use the following command (but I am not sure):

curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.<OX>-amd64.tar.xz --output canu-2.2.<OS>.tar.xz

If this is correct, what should I define <OX> and <OS> as?

Thank you.

skoren commented 5 months ago

It looks like markdown hid the \<OS> text in your issue and the documents. The <OS> is either Darwin or Linux, depending on what machine you're running on, both are x86. So if you have linux you'd want:

curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.Linux-amd64.tar.xz --output canu-2.2.Linux.tar.xz 
tar -xJf canu-2.2.*.tar.xz
Narmatha99 commented 5 months ago

Thank you for the explanation. I am however working on a Mac Pro, does that mean that Canu is not compatible with MacOS?

skoren commented 5 months ago

OSX is Darwin so you should download the Darwin tarball. Since it's compiled for x86 architectures if you have apple silicon you'll need to have rosetta (https://support.apple.com/en-us/102527) which it should automatically install when you run canu. You'll also need the other requirements for running canu listed on the release notes page:

Perl 5.12.0+, or File::Path 2.08
Java SE 8+

I confirmed I was able to run a small assembly on my MacBook Air w/apple M2.

Narmatha99 commented 5 months ago

I was able to download Canu, thank you very much! But I have come to the next problem (which is my estimation mistake I think). I used the following command: ~/Downloads/canu-2.2/bin/canu -d barcode1 -p pseaer genomeSize=5.5m -nanopore ~/Downloads/QIA_test_fastq/Long_read_barcode_1.fastq.gz

And got this error: Scherm­afbeelding 2024-06-26 om 09 58 29

So if I understand correctly the problem is my genome size in this? How should I estimate the genome size? I know already what bacteria it is (pseudomonas aeruginosa) and I googled the haploid genome size of what I probably assembled (5.5 - 7 Mbp, I tried it with 5.5 Mbp).

skoren commented 5 months ago

The issue is you don't have enough input to assemble the genome. The genome size should be approximately what you expect the assembly to be. What is the input set you're using, how many reads/bases?

Narmatha99 commented 4 months ago

Ah okay, so my input is a library of 8 samples pooled together but each one with a barcode (ONT + rapid barcoding kit). I did basecalling through the terminal with Dorado so now I have bam and fastq files. I am not sure how I would check the number of reads/bases because I did basecalling outside the usual nanopore software.

skoren commented 4 months ago

canu would put that info in the report file, you can also check the info in the pseaer.seqStore/info.txt. Post that file here.

Narmatha99 commented 4 months ago

Scherm­afbeelding 2024-07-08 om 14 55 42 So I tried it again with a Citrobacter Freundii but again I get the error message. I have added the info.txt underneath.

 Reads        Bases Read Type

     0            - total-reads
     0            0 raw
     0            0 raw-trimmed
     0            0 raw-compressed
     0            0 raw-compressed-trimmed
     0            0 corrected
     0            0 corrected-trimmed
     0            0 corrected-compressed
     0            0 corrected-compressed-trimmed
Narmatha99 commented 4 months ago

And the canu_asm.seqStore.sh: Scherm­afbeelding 2024-07-08 om 15 00 41

skoren commented 4 months ago

According to the log there are no reads in the input files, are all the reads shorter than 1kb? Can you post the canu_asm.seqStore.err file as well as one of the fastq.gz sets (see https://canu.readthedocs.io/en/latest/faq.html#how-can-i-send-data-to-you if it's too big for GitHub).

Narmatha99 commented 4 months ago

I send them through FTP. Can you let me know if you are able to download both files?

skoren commented 4 months ago

Unfortunately I don't see the files on the FTP site, are you able to send them another way?

skoren commented 3 months ago

Idle, initial issue resolved. Secondary issue is due to invalid input, a non-gz file being provided as gz. The seqStore error reports:

gzip: Long_reads_barcode_20.fastq.gz: not in gzip format
Loaded             0   0.0%            0   0.0%  Long_reads_barcode_20.fastq.gz

All reads processed.

Looking at the upload, it seems the file is not gzipped, it's just a regular fastq so removing the extension would fix the problem. That said, the data is very small given the genome size of 4-5mb, it's only about 1x which isn't enough.