lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
44 stars 11 forks source link

Failure to generate nonpareil curves #9

Closed carden24 closed 9 years ago

carden24 commented 9 years ago

I have problems making the nonpareil curves with R. I run mpi nonpareil as usual except I use the -L 25 options because I expect poor coverage. I also know from pyrotag data that I have closely related species in my samples. There was no errors in running nonpareil and I get "everything seems correct result' However when I try to get the curves I get the following warning messages:

Warning messages: ...Convergence failure: false convergence (8) ...Model didn't converge

My questions are: -Is my coverage result still valid? -Do I need to rerun nonpareil with different paramaters? or the modelling of the curves only affects the sequencing effort needed.

thanks, Erick

lmrodriguezr commented 9 years ago

The model parameters are adjusted depending on the coverage, and -L 25 is the least reliable parameter set (although in some cases is the ONLY option). Can I have a copy of the .npo file? I would like to take a look at the curve.

Thanks! Miguel.

carden24 commented 9 years ago

here is one of the filles. https://www.dropbox.com/s/52dtnanhi1tipc2/A8-OM0C0-O3.npo?dl=0

lmrodriguezr commented 9 years ago

I believe the problem is insufficient sampling at lower sequencing effort. It can be solved by re-running nonpareil with "-d 0.7". This parameter will turn on "logarithmic sampling", which will uniformly subsample in logarithmic space (as opposed to linear space, the default). The documentation indicates that this is experimental code, but I've tested it in a large array of datasets and I'm confident it's a stable feature. This subsampling should be preferred, and I attempt to make it the default in the next release. Please let me know if this solves the convergence failure.

carden24 commented 9 years ago

Running the sampling at the logarithmic mode solved the convergence issue. Here is a comparison of the of the results. Performance was similar (resources) and the results are comparable (I do not expect the exact same results due to sampling). thanks

Erick

Sampling method Kappa Coverage LRstar LR ModelR Diversity
Log sampling 0.37363 0.4352244 1.13E+11 8470238989 0.9994858 23.00952
Linear sampling 0.38015 0.4416335 0 8470238989 0 0
koopkaup commented 9 years ago

I have similar problem and even if I used -d 0.7 I can't generate curves. Also I tried running it with larger query size, but that only worked for some of the samples (I have 5 samples in total). I get this message in R for samples that have low coverage "Median of the curve is zero at 20% of the reads, check parameters and re-run (e.g., decrease value of -L in nonpareil)." I tried to run one of the samples in your online version of Nonpareil and here is the result http://enve-omics.ce.gatech.edu/nonpareil/results?jid=546c987b1372f Any ideas? I could also send you my .npo files.

Thank you, Kristjan

lmrodriguezr commented 9 years ago

@koopkaup your dataset seems to be too small for Nonpareil to accurately project the coverage. The coverage is still estimated (14.65% in the link above) and you can still visualize the subsampled curve (in the link above select "Plot curve only", or in the R interface set "plotModel=F"). However, diversity and projected sequencing effort are both unavailable without an accurate model.

lmrodriguezr commented 9 years ago

Solved in v2.400