findGSE is a tool for estimating size of (heterozygous diploid or homozygous) genomes by fitting k-mer frequencies iteratively with a skew normal distribution model.
31
stars
10
forks
source link
Iterative fitting stops at iter 1, expected: 10 #3
I have run FinsGSE on a histogram file generated at k=23. I noticed a high rate of heterozygosity being reported on (~1% higher than other programs, and when run at different k-values).
I get the following message while running FindGSE at k=23:
findGSE initialized...
Info: histo file provided as /home/allison/Documents/Project2_2019/K-mer_Analysis/K-mer_Counting/00_Raw/K23/bbnormK23_raw.hist
Info: size k set as 23
Info: output folder set as /home/allison/Documents/Project2_2019/K-mer_Analysis/K-mer_Counting/00_Raw/K23/FindGSE/bbnormK23raw_193_max900k_2
Info: expected coverage of homozygous k-mers set as 193
Info: het observed set as true ==> heterozygous fitting asked.
Iterative fitting process for sample bbnormK23_raw.hist started...
Size 23 fitting for het k-mers
Warning: two peaks found in k-mer freq with given -exp_hom,
peak with lower height as hom-peak!
Info: het_peak_pos for het fitting: 95
Info: hom_peak_pos for hom fitting: 191
Info: het_xfit_left for het fitting: 40
Info: het_xfit_right for het fitting: 143
Size 23 at itr 1
Info: min_valid_pos: 143
Info: signal error border: 75
Info: hom_xfit_left for hom fitting at itr 1 : 177
Info: hom_xfit_right for hom fitting at itr 1 : 217
Fitting has to be repeated: sizek 23 at itr 1
Info: hom_xfit_left for hom fitting at itr 1 : 177
Info: hom_xfit_right for hom fitting at itr 1 : 217
Note on hom fitting: fitting stopped at iter 1, expected: 10
Iterative fitting done.
Genome size estimate for bbnormK23_raw.hist: 1012737646 bp.
Time consumed: 27.00648 secs
The genome size and repeats are in line with other programs, and when using different k-values. Are the fewer iterations causing a higher heterozygosity rate? Do you have any suggestions on how it can be fixed?
Good day,
I have run FinsGSE on a histogram file generated at k=23. I noticed a high rate of heterozygosity being reported on (~1% higher than other programs, and when run at different k-values).
I get the following message while running FindGSE at k=23:
The genome size and repeats are in line with other programs, and when using different k-values. Are the fewer iterations causing a higher heterozygosity rate? Do you have any suggestions on how it can be fixed?
Kind regards, Allison