phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
In essence, the freq-object is similar in length to the number of rows in mod_melts (reflecting all samples/controls). This is also the basis for df.
However, the following two lines puzzle me, as first the negative controls are excluded from df (reducing the number of rows), after which the - original length - freq-object is used for further selection (freq > 0). These dimensions do not match and will introduce NA-rows, resulting in the error above.
I figured this does not happen when samples in the original phyloseq are ordered in such a way that all samples are followed by all controls.
To test this idea I reordered the original phyloseq:
Following up on 1., I found that setting the neg-parameter results in model behaviour that deviates from the idea of is.Contaminant. As far as I understood from the paper, the frequency-method uses all density information regardless whether a sample is a biological sample/control sample. However, when setting the neg-parameter, control samples are excluded from the model (e.g. the red line only represents a fit between DNA concentration and frequency in biological samples). This is not the case when not setting the neg-parameter.
Example (continuing from above):
p1 <- plot_frequency(ps_reorder, taxa_names(ps_reorder)[c(1, 2)], conc = "quant_reading", neg = "is.neg")
# Does not include blanks for models; does not align with is.Contaminant()-function
p2 <- plot_frequency(ps_reorder, taxa_names(ps_reorder)[c(1, 2)], conc = "quant_reading")
# Does include blanks for models
cowplot::plot_grid(p1, p2, ncol = 1, align = "v", axis = "lr")
Compare the red line between upper (neg-parameter set) and lower (neg-parameter not set) plots. It is slightly different, showing that only in the case of p2 the blanks were included in the modelling (as I believe should be the case, given the idea of is.Contaminant()).
Curious to hear your thoughts on this and happy to help adjust if needed.
Recently I have been exploring the
plot_frequency()
-function in a bit more depth and I found two issues:neg
-parameter is set (this does not occur withneg
= NULL (default)).Example (based on tutorial):
I found out this has to do with a few lines of code in the
plot_frequency()
-function, namely:In essence, the
freq
-object is similar in length to the number of rows inmod_melts
(reflecting all samples/controls). This is also the basis fordf
. However, the following two lines puzzle me, as first the negative controls are excluded fromdf
(reducing the number of rows), after which the - original length -freq
-object is used for further selection (freq > 0
). These dimensions do not match and will introduce NA-rows, resulting in the error above. I figured this does not happen when samples in the original phyloseq are ordered in such a way that all samples are followed by all controls.To test this idea I reordered the original phyloseq:
neg
-parameter results in model behaviour that deviates from the idea ofis.Contaminant
. As far as I understood from the paper, the frequency-method uses all density information regardless whether a sample is a biological sample/control sample. However, when setting theneg
-parameter, control samples are excluded from the model (e.g. the red line only represents a fit between DNA concentration and frequency in biological samples). This is not the case when not setting theneg
-parameter.Example (continuing from above):
Compare the red line between upper (
neg
-parameter set) and lower (neg
-parameter not set) plots. It is slightly different, showing that only in the case ofp2
the blanks were included in the modelling (as I believe should be the case, given the idea ofis.Contaminant()
).Curious to hear your thoughts on this and happy to help adjust if needed.