marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

How to install canu #2324

Open Narmatha99 opened 1 week ago

Narmatha99 commented 1 week ago

I am new to this method of genome assembly, so I am struggling a little with installing Canu. I think I should use the following command (but I am not sure):

curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.<OX>-amd64.tar.xz --output canu-2.2.<OS>.tar.xz

If this is correct, what should I define <OX> and <OS> as?

Thank you.

skoren commented 1 week ago

It looks like markdown hid the \<OS> text in your issue and the documents. The <OS> is either Darwin or Linux, depending on what machine you're running on, both are x86. So if you have linux you'd want:

curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.Linux-amd64.tar.xz --output canu-2.2.Linux.tar.xz 
tar -xJf canu-2.2.*.tar.xz
Narmatha99 commented 1 week ago

Thank you for the explanation. I am however working on a Mac Pro, does that mean that Canu is not compatible with MacOS?

skoren commented 1 week ago

OSX is Darwin so you should download the Darwin tarball. Since it's compiled for x86 architectures if you have apple silicon you'll need to have rosetta (https://support.apple.com/en-us/102527) which it should automatically install when you run canu. You'll also need the other requirements for running canu listed on the release notes page:

Perl 5.12.0+, or File::Path 2.08
Java SE 8+

I confirmed I was able to run a small assembly on my MacBook Air w/apple M2.

Narmatha99 commented 1 week ago

I was able to download Canu, thank you very much! But I have come to the next problem (which is my estimation mistake I think). I used the following command: ~/Downloads/canu-2.2/bin/canu -d barcode1 -p pseaer genomeSize=5.5m -nanopore ~/Downloads/QIA_test_fastq/Long_read_barcode_1.fastq.gz

And got this error: Scherm­afbeelding 2024-06-26 om 09 58 29

So if I understand correctly the problem is my genome size in this? How should I estimate the genome size? I know already what bacteria it is (pseudomonas aeruginosa) and I googled the haploid genome size of what I probably assembled (5.5 - 7 Mbp, I tried it with 5.5 Mbp).

skoren commented 1 week ago

The issue is you don't have enough input to assemble the genome. The genome size should be approximately what you expect the assembly to be. What is the input set you're using, how many reads/bases?

Narmatha99 commented 5 hours ago

Ah okay, so my input is a library of 8 samples pooled together but each one with a barcode (ONT + rapid barcoding kit). I did basecalling through the terminal with Dorado so now I have bam and fastq files. I am not sure how I would check the number of reads/bases because I did basecalling outside the usual nanopore software.

skoren commented 5 hours ago

canu would put that info in the report file, you can also check the info in the pseaer.seqStore/info.txt. Post that file here.