schneebergerlab / findGSE

findGSE is a tool for estimating size of (heterozygous diploid or homozygous) genomes by fitting k-mer frequencies iteratively with a skew normal distribution model.
31 stars 10 forks source link

Iterative fitting stops at iter 1, expected: 10 #3

Closed AllisonStander closed 4 years ago

AllisonStander commented 4 years ago

Good day,

I have run FinsGSE on a histogram file generated at k=23. I noticed a high rate of heterozygosity being reported on (~1% higher than other programs, and when run at different k-values).

I get the following message while running FindGSE at k=23:

findGSE initialized...
   Info: histo file provided as /home/allison/Documents/Project2_2019/K-mer_Analysis/K-mer_Counting/00_Raw/K23/bbnormK23_raw.hist
   Info: size k set as 23
   Info: output folder set as /home/allison/Documents/Project2_2019/K-mer_Analysis/K-mer_Counting/00_Raw/K23/FindGSE/bbnormK23raw_193_max900k_2
   Info: expected coverage of homozygous k-mers set as 193
   Info: het observed set as true ==> heterozygous fitting asked. 

Iterative fitting process for sample bbnormK23_raw.hist started... 
    Size 23 fitting for het k-mers 
    Warning: two peaks found in k-mer freq with given -exp_hom,
                  peak with lower height as hom-peak!
    Info: het_peak_pos for het fitting:  95 
    Info: hom_peak_pos for hom fitting:  191 
    Info: het_xfit_left  for het fitting:  40 
    Info: het_xfit_right for het fitting:  143 
    Size 23 at itr 1
    Info: min_valid_pos:  143 
    Info: signal error border:  75 
    Info: hom_xfit_left  for hom fitting at itr  1 :  177 
    Info: hom_xfit_right for hom fitting at itr  1 :  217 
       Fitting has to be repeated: sizek 23 at itr 1
    Info: hom_xfit_left  for hom fitting at itr  1 :  177 
    Info: hom_xfit_right for hom fitting at itr  1 :  217 
    Note on hom fitting: fitting stopped at iter 1, expected: 10
Iterative fitting done.

Genome size estimate for bbnormK23_raw.hist: 1012737646 bp.

Time consumed:  27.00648 secs

The genome size and repeats are in line with other programs, and when using different k-values. Are the fewer iterations causing a higher heterozygosity rate? Do you have any suggestions on how it can be fixed?

Kind regards, Allison

HeQSun commented 4 years ago

fixed.