schneebergerlab / findGSE

findGSE is a tool for estimating size of (heterozygous diploid or homozygous) genomes by fitting k-mer frequencies iteratively with a skew normal distribution model.
31 stars 10 forks source link

several issues: Warning: data does not follow assumed distribution anymore at itr 1 + Error in singlestart:singleend : NA/NaN argument #5

Closed CornilleAmandine closed 7 months ago

CornilleAmandine commented 3 years ago

Hello!

I have generated my .histo for hundred of individuals.

lapply(files, function(x){ findGSE(histo=x, sizek=17, outdir=paste("out_", x, sep=""))})

Seems there are different issues depending on the samples I use. So then Ir an using the single sample command line to see what was the pb.

For some samples I got :

**Warning: data does not follow assumed distribution anymore at itr 1, fitting stopped.

Error in dnorm(xfit2, mean = meanfit, sd = sdfit) : object 'sdfit' not found**

For other one I got : Warning: data does not follow assumed distribution anymore at itr 1, fitting stopped. Iterative fitting done.

Error in singlestart:singleend : NA/NaN argument

And for some it works!

Does it depend on the quality of my data? Why I have different error messages? Could you help me please?

FYI, I have used 17mer for running jellyfish.

Thanks a lot for your help! Cheers Amandine

HeQSun commented 3 years ago

Hi Amandine,

sorry for the late reply -- somehow I missed the msg from you..

--

What kind of sequencing data do you have and what is the coverage?

This is usually due to low sequencing depth, and the "bell-shaped" k-mer freq distribution was not there. Can you share some of them pdfs of k-mer histograms (where there were failures)?

Best, Hequan