paoloshasta / shasta

De novo assembly from Oxford Nanopore reads.
https://paoloshasta.github.io/shasta/
Other
66 stars 9 forks source link

Assertion failed LowHash0::adjustMinMaxBucketSizes #25

Closed asylvz closed 3 months ago

asylvz commented 3 months ago

Hello, I'm trying the newer version and getting the following error with a fasta of 5 ONT reads:

Shasta Release 0.12.0
2024-May-13 11:33:45.266965 Assembly begins.
Command line:
shasta --input res/NA12878_ga2/in/H1-s549615_4.fasta --assemblyDirectory dnm --config Nanopore-ncm23-May2024 --suppressStdoutLog --Assembly.mode2.suppressDetailedOutput --Assembly.mode2.suppressGfaOutput --threads 16 
For options in use for this assembly, see shasta.conf in the assembly directory.
This run uses options "--memoryBacking 4K --memoryMode anonymous".
This could result in longer run time.
For faster assembly, use "--memoryBacking 2M --memoryMode filesystem"
(root privilege via sudo required).
Therefore the results of this run should not be used
for the purpose of benchmarking assembly time.
However the memory options don't affect assembly results in any way.
This assembly will use 16 threads.
Discarded read statistics for file /gpfs/project/projects/medbioinf/users/asoylev/svarp_github/SVarp2/res/NA12878_ga2/in/H1-s549615_4.fasta:
    Discarded 0 reads containing invalid bases for a total 0 valid bases.
    Discarded 0 reads shorter than 10000 bases for a total 0 bases.
    Discarded 0 reads containing repeat counts 256 or more for a total 0 bases.
Discarded read statistics for all input files:
    Discarded 0 reads containing invalid bases for a total 0 valid bases.
    Discarded 0 short reads for a total 0 bases.
    Discarded 0 reads containing repeat counts 256 or more for a total 0 bases.
Read statistics for reads that will be used in this assembly:
    Total number of reads is 5.
    Total number of raw bases is 95671.
    Average read length is 19134.2 bases.
    N50 for read length is 19055 bases.
Found 0 reads with duplicate names.
Discarded from the assembly 0 reads with duplicate names.
Flagged 0 reads as palindromic out of 5 total.
Palindromic fraction is 0
LowHash0 algorithm will use 2^12 = 4096 buckets. 
2024-May-13 11:33:45.315985 Assertion failed: done at void shasta::LowHash0::adjustMinMaxBucketSizes(const std::vector<long unsigned int>&) in /home/runner/work/shasta/shasta/src/LowHash0.cpp line 587
paoloshasta commented 3 months ago

Your assembly has only 5 input reads, and Shasta is definitely not designed for such a small assembly.

If you want to try anyway, to go past the MinHash phase I suggest adding --MinHash.allPairs to your command line. This will cause all possible pairs of oriented reads to be used as alignment candidates. For a typical assembly this is prohibitively expensive, but if you only have a very small number of reads it will be fine.

As announced in the release notes for Shasta 0.12.0, the new assembly configuration Nanopore-ncm23-May2024 is only designed to be used with the new experimental high accuracy reads from the Oxford Nanopore 2023.12 data release. It will not work with reads of lower accuracy. In particular, it will not work with ONT R10 or R9 reads.

As a separate comment, the option --Assembly.mode2.suppressGfaOutput you are using on your command line only applies to Mode 2 assembly and has no effect for Mode 3 assembly. Assembly configuration Nanopore-ncm23-May2024 uses Mode 3 assembly. There is currently no option to suppress GFA output in Mode 3 assembly.

asylvz commented 3 months ago

I see, thank you so much.