marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Issue with Genome Assembly: Contigs.fasta File Size is Double the Expected #2342

Closed DryadDataDiver closed 1 week ago

DryadDataDiver commented 1 week ago

Hello! @skoren As a beginner in genome assembly, I humbly seek your guidance on an issue I am encountering. I am using PacBio HiFi data for assembly, but the resulting contigs.fasta file is approximately twice the expected size for my species. In my code, s3hifi.fa is additional sequencing data, not paired-end. I am using a Linux system and submitting tasks via the LSF system. Here is my code:

canu -d $output_dir -p contigs genomeSize=414.4m useGrid=false -pacbio-hifi $hifi/s2hifi.fa $hifi/s3hifi.fa \
> canudefult.log 2>&1

also the log: canudefult.log

skoren commented 1 week ago

https://canu.readthedocs.io/en/latest/faq.html#my-genome-size-and-assembly-size-are-different-help

DryadDataDiver commented 1 week ago

https://canu.readthedocs.io/en/latest/faq.html#my-genome-size-and-assembly-size-are-different-help

Thank you for your response! I had seen this solution before, but I couldn’t connect the dots due to my lack of theoretical background. I will try to resolve the issue using the methods mentioned in the FAQ.