Closed ottentim closed 8 years ago
Hello @ottentim. Your interpretation of the plot is correct, but note that the values in the .npo file are not measuring the same thing. The values in the .npo file correspond to redundancy, which is a value specific to the parameters used, in particular to -L. On the other hand, the plot's values are the estimated coverage depth, which is (on average) independent of the -L parameter. Both measurements correlate very well in log-log space, but the exact values differ (as you saw in your examples).
Using the notation of the original publication, the .npo values are κ (redundancy), while the values in the plot are C-hat (estimated abundance-weighted average coverage). Assuming that you used -L 0.5 (default), a value of redundancy of 72.8% would correspond to an average coverage of about 78.6% (0.7280.757, from Suppl. Table 1), which looks about right in your plot.
To get the exact value of coverage, see the output of Nonpareil.curve
in the R console, which should be a list (or a data.frame, if you used Nonpareil.curve.batch
). The value labeled C
corresponds to C-hat.
Thank you Luis,
Your response was very helpful. This is a really great tool! Happy New Year.
Cheers,
Tim Otten
On Wed, Dec 30, 2015 at 11:28 AM, Luis M Rodriguez-R < notifications@github.com> wrote:
Hello @ottentim https://github.com/ottentim. Your interpretation of the plot is correct, but note that the values in the .npo file are not measuring the same thing. The values in the .npo file correspond to redundancy http://nonpareil.readthedocs.org/en/latest/redundancy.html#output, which is a value specific to the parameters used, in particular to -L. On the other hand, the plot's values are the estimated coverage depth, which is (on average) independent of the -L parameter. Both measurements correlate very well in log-log space, but the exact values differ (as you saw in your examples).
Using the notation of the original publication http://bioinformatics.oxfordjournals.org/content/30/5/629.long, the .npo values are κ (redundancy), while the values in the plot are C-hat (estimated abundance-weighted average coverage). Assuming that you used -L 0.5 (default), a value of redundancy of 72.8% would correspond to an average coverage of about 78.6% (0.7280.757, from Suppl. Table 1 http://bioinformatics.oxfordjournals.org/content/30/5/629/suppl/DC1), which looks about right in your plot.
To get the exact value of coverage, see the output of Nonpareil.curve in the R console, which should be a list (or a data.frame, if you used Nonpareil.curve.batch). The value labeled C corresponds to C-hat.
— Reply to this email directly or view it on GitHub https://github.com/lmrodriguezr/nonpareil/issues/12#issuecomment-168060261 .
Timothy G. Otten, PhD, MPH Department of Microbiology Oregon State University 226 Nash Hall Corvallis, OR 97331 T: 541-737-1796
When running nonpareil from the command line and viewing the .npo output file in R, where does the "empty circle" value for the estimated coverage depth vs sequencing effort com from?? Presumably it is from the end of the .npo file (see below for output), which in the case of the attached output suggests that the coverage was about 72.8% for my sequencing effort: ... 1732486 0.35225 0.05063 0.31818 0.35354 0.38462 2474980 0.40542 0.04240 0.37692 0.40458 0.43333 3535685 0.46346 0.03631 0.43787 0.46512 0.48837 5050979 0.51771 0.02825 0.49793 0.51894 0.53689 7215684 0.57511 0.02255 0.56012 0.57550 0.59053 10308120 0.63122 0.01838 0.61943 0.63125 0.64300 14725885 0.68184 0.01314 0.67288 0.68182 0.69086 21036979 0.72835 0.00807 0.72249 0.72833 0.73418
Yet when I plot the data in R, the empty circle appears about 8-10% higher than this, and this is the same for all six of the datasets I'm trying this on.
Am I interpreting this correctly, Is the last line, second column of the .npo output file the estimate of coverage depth for my sequencing effort?? If not, how do I determine this? Thanks for your help, I think your method has a lot of potential!!