kensung-lab / hypo

HyPo: Super Fast & Accurate Polisher for Long Read Genome Assemblies
GNU General Public License v3.0
59 stars 7 forks source link

149720 which exceeds the limit of 4294967295 : overflow? #8

Open bwlang opened 4 years ago

bwlang commented 4 years ago

I ran hypo like this (now with correct input):

hypo -d $HOME/deer_assemblies/releases/v02.1/ovi_v02.1.fasta \ -r @illumina_reads.fofn \ -b $HOME/deer_assemblies/releases/v02.1/alignments/ovi_v02.1.combined.bam \ -B $HOME/deer_assemblies/releases/v02.1/alignments/ont/deer_ont_v02.1.bam \ -c 80 -s 2.4g -p 30 -t 16 -i

*****************************************************************************************************************************************************
Stage 1: 100%
Stage 2: 100%
1st stage: 2267.27s
2nd stage: 2153.25s
Total    : 4420.52s
Tmp size : 209385MB

Stats:
   No. of k-mers below min. threshold :   1391580984
   No. of k-mers above max. threshold :     25064298
   No. of unique k-mers               :   4879558594
   No. of unique counted k-mers       :   3462913312
   Total no. of k-mers                : 145066797137
   Total no. of reads                 :   1130840130
   Total no. of super-k-mers          :  31206344409
[Hypo::Utils] Info: Value of K chosen for the given genome size (2.4g): 17
[Hypo::Utils] Info: File size expected for the given genome size (2.4g) and cov (80): 384G
Given Command: hypo  -d /mnt/home/langhorst/deer_assemblies/releases/v02.1/ovi_v02.1.fasta -r @illumina_reads.fofn -b /mnt/home/langhorst/deer_assemblies/releases/v02.1/alignments/ovi_v02.1.combined.bam -B /mnt/home/langhorst/deer_assemblies/releases/v02.1/alignments/ont/deer_ont_v02.1.bam -c 80 -s 2.4g -p 30 -t 16 -i .
[Hypo::Utils] Info: Intermediate Files will be stored.
[Hypo::Utils] Info: Beginning from stage: 0
RESOURCES ([SUK:KMC]: Running KMC done. ): TIME= 4421.39 sec; PEAK RSS (so far)= 2074MB; CURRENT RSS (so far)= 2074MB.
RESOURCES ([SUK:KMC]: Kmers Histogram done. ): TIME= 73.1288 sec; PEAK RSS (so far)= 3131MB; CURRENT RSS (so far)= 3131MB.
[SolidKmers] Info: Error-threshold freq: 17, Lower-threshold freq: 17, Upper-threshold freq: 0, Mean-coverage: 46
RESOURCES ([SUK]: Finding cutoffs. ): TIME= 5.432e-06 sec; PEAK RSS (so far)= 3131MB; CURRENT RSS (so far)= 3131MB.
RESOURCES ([SUK]: Filling bitvectors. ): TIME= 86.5815 sec; PEAK RSS (so far)= 3131MB; CURRENT RSS (so far)= 3131MB.
RESOURCES ([SUK]: Clearing files. ): TIME= 0.0006305 sec; PEAK RSS (so far)= 3131MB; CURRENT RSS (so far)= 3131MB.
[SolidKmers] Info: Number of solid kmers found: 0
RESOURCES ([SolidKmers]: Overall. ): TIME= 4586.98 sec; PEAK RSS (so far)= 3643MB; CURRENT RSS (so far)= 3131MB.
RESOURCES ([Hypo:Hypo]: Computed Solid kmers. ): TIME= 4589.83 sec; PEAK RSS (so far)= 3643MB; CURRENT RSS (so far)= 2074MB.
[Hypo::Hypo] Info: Number of (canonical) solid kmers (nonhp) : 0
RESOURCES ([Hypo:Hypo]: Loaded Contigs. ): TIME= 79.9818 sec; PEAK RSS (so far)= 3939MB; CURRENT RSS (so far)= 3939MB.
RESOURCES ([Hypo:Hypo]: Found Solid pos in contigs. ): TIME= 29.9156 sec; PEAK RSS (so far)= 4015MB; CURRENT RSS (so far)= 4015MB.
********** [Hypo::Hypo] Info: BATCH-ID: 0
Aln processed (in 100 M): 0
[Hypo::Hypo] Info: Number of alignments (Batch 0): loaded (11443271) invalid (14095)
RESOURCES ([Hypo:Hypo]: Loaded alignments. ): TIME= 57.6795 sec; PEAK RSS (so far)= 7823MB; CURRENT RSS (so far)= 7823MB.
Kmer support update: 0
Kmer support update: 10
Kmer support update: 20
Kmer support update: 29
RESOURCES ([Hypo:Hypo]: Solid kmers support update. ): TIME= 0.209694 sec; PEAK RSS (so far)= 7824MB; CURRENT RSS (so far)= 7824MB.
[Hypo::Contig] Error: Length exceed limit: The distance between consecutive minimisers within a window is 149720 which exceeds the limit of 4294967295 !
Ritu-Kundu commented 4 years ago

Hi Brad,

This was because we had assumed that the distance between consecutive "valid" minimsers (required for one of the steps) is less than 65535 (maximum limit of unsigned int 16). However, the message printed when the assumption failed was showing (mistakenly) a different limit (unsigned int 32). Now, we have fixed that assumption in the Release v1.0.2; distance in your data-set can now be handled. Hope that the rest of the run is smooth on your data-set.