pirovc / grimer

GRIMER performs analysis of microbiome studies and generates a portable and interactive dashboard integrating annotation, taxonomy and metadata with focus on contamination detection.
https://pirovc.github.io/grimer/
MIT License
20 stars 3 forks source link

Using a positive control #7

Open RichStack opened 1 year ago

RichStack commented 1 year ago

Hi again, I have a mock community I'm running through with my other samples, but I'm a little unsure how to specify this in the config file.

I am able to point to the mock sample by specifying the metadata value under 'sample-type' column, but I don't know how to direct grimer to the expected composition of the mock. I have a tsv file which contains taxonomic levels for each mock member and expected relative frequency. Is it possible to also direct grimer to this data in the config file? Thanks in advance.

pirovc commented 1 year ago

Can you give me some examples of your files so I can better help?

RichStack commented 1 year ago

Thanks. This is an example of the mock community expected relative frequency tsv file I have:

Taxonomy MC Bacteria;Actinobacteriota;Actinobacteria;Micrococcales;Micrococcaceae;Micrococcus 0.02 Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus 0.23 Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Acinetobacter 0.23 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Morganellaceae;Providencia 0.23 Bacteria;Firmicutes;Bacilli;Staphylococcales;Staphylococcaceae;Staphylococcus 0.02 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia-Shigella 0.02 Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Lactobacillus 0.02 Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas 0.23

So it has taxonomic ranks of the organisms with a ';' separator, as does my input file, which is also in tsv format.

My Mock Community (MC) is one of the sample types included in the input tsv file, and I have sequence counts for this sample as with all the others.

I have a sample metadata file too - and this includes a categorical column, specifying sample-type. The Mock Community I ran in this run of sequencing is identified as 'sample-type' Mock.

So I can include this line in the config file: `controls: "Positve Control": "sample-type":

However, by running that, it will tell grimer which sample is the mock, but won't tell grimer what relative frequencies are expected. I can see how negative controls are easily run as part of the decontam package, but I'm unsure how Mock Controls are interpreted.

So, I guess one of my questions is, is it possible for grimer to handle this type of sample, or is it just negative/blanks?

Hope that makes sense, and thanks.

pirovc commented 1 year ago

Now I get it but unfortunately there's no support for this kind of analysis in GRIMER. By adding the "sample-type": - "Mock" in your controls, GRIMER will only check which organisms are in your mock samples and consider them as positive control, without checking for their relative frequencies, as you noted.

If there was such a feature, how would you expect to see the in the plots/table? I'll mark this as an enhancement

RichStack commented 1 year ago

Hi thanks for confirming with me.

There is a function in Qiime that carries out a linear regression for mock expected relative frequencies versus those observed, so I can get this functionality elsewhere, but your question made me think about this use of mock samples in GRIMER in the context of use as a reliable positive control.

In my last run my mock sample also contained contaminant sequences, and although these sequences are usually shared with negative controls, I wasn't sure if by adding the mock as a positive control to the GRIMER analysis, these sequences would then result in not being flagged as contaminants (hope I explained that well). So maybe some kind of feature that at least allows the user to indicate which sequences are expected to occur in the mock sample, would allow for some useful filtering.

Thanks again for getting back to me, and would just like to say that I really like the functionality of GRIMER so far. The database of contaminant organisms that you've curated has been excellent and very useful to me - so thanks very much. R