pirovc / grimer

GRIMER performs analysis of microbiome studies and generates a portable and interactive dashboard integrating annotation, taxonomy and metadata with focus on contamination detection.
https://pirovc.github.io/grimer/
MIT License
13 stars 2 forks source link

Decontam without any controls #8

Open ailtonpcf opened 9 months ago

ailtonpcf commented 9 months ago

"Hey, thank you very much for this cool tool. I'm working on amplicon sequencing derived from public data with taxa at the genus level. According to the instructions, the only mandatory input is the abundance table. However, I'm interested in finding contamination associated with this data. My idea is to use the contaminants reference, as my dataset doesn't have any controls. If I run the following command, the decontamination step is not performed:

grimer -i fungi.genus.raw.csv --transpose -f ","

Since my table is at the genus level and the contaminants.yaml contains NCBI accessions, how can I perform decontamination?

Thanks in advance, Ailton."

pirovc commented 9 months ago

Hi!

You have to activate taxonomy for your run with --taxonomy ncbi. You also have to point to the files with the contamination references with --config config.yml. You can download the default contamination files from the GRIMER repository. Here is an example: https://pirovc.github.io/grimer/config/#using-the-configuration-file

Let me know if something is unclear.

Best Vitor

antoine4ucsd commented 1 week ago

Hello I am following up on that question. I have a set of human gut microbiome samples and I would like to perform some decontamination.

I created the suggested config.yml

references:
  "Contaminants": "/Users/antoinechaillon/Dropbox/_microbiome/grimer/files/contaminants.yml"
  "Human-related": "/Users/antoinechaillon/Dropbox/_microbiome/grimer/files/human-related.yml" 

external:
  mgnify: "~/files/mgnify5989.tsv"
  decontam:
    threshold: 0.1 # [0-1] P* hyperparameter
    method: "frequency" # frequency, prevalence, combined

my amplicons.txt is exported directly from my phyloseq object (see attached) a subset of it. I suspect it might not contain all the required information. I can also extract the tax_table? if yes, how to link both? I do also have metadata in my phyloseq object. maybe there is a way to export the phyloseq object as biom and use it as input? amplicons.example.txt

If I run grimer

grimer --input-file amplicons.txt \
       --config config.yml \
       --decontam --mgnify \
       --taxonomy ncbi \
       --ranks superkingdom phylum class order family genus species

I get this nice report attached but I suspect no contamination has been performed. https://www.dropbox.com/scl/fi/ag0e9mqjy06nf7rkxm2j7/output.html?rlkey=6wfke1gkbonp5t99qq33a8oe7&dl=0 Screen Shot 2024-06-24 at 6 36 43 AM

also not sure if I should use --taxonomy ncbi. my data were processed and classified with qiime2 and the q2-feature-classifier option

all suggestions are very welcome thank you!

antoine4ucsd commented 1 week ago

I was able to run grimer on my dataset with the following

grimer --input-file ./data/amplicons.txt      --config ./config.yml     --decontam --mgnify        --taxonomy greengenes   --ranks superkingdom phylum class order family genus species 

and the config

references:
  "Contaminants": "contaminants.yml"
  "Human-related": "human-related.yml" 

external:
  mgnify: "/Users/antoinechaillon/Dropbox/_microbiome/grimer/files/mgnify5989.tsv"
  decontam:
    threshold: 0.1 # [0-1] P* hyperparameter
    method: "frequency" # frequency, prevalence, combined

how can I output the OTU table after decontamination of non-human taxa? Screen Shot 2024-06-24 at 1 41 15 PM

thank you!