rx32940 / Lepto-Metagenomics

3 stars 0 forks source link

MG-RAST pipeline #2

Open rx32940 opened 4 years ago

rx32940 commented 4 years ago

This is a complete pipeline for metagenomic analysis. The purpose of trying out this pipeline is due to the discrepancies between my Kraken2/Clark results and the company's result. Due to the fact I skipped the QC and host clean step for Kraken2/Clark analyses (which I believe the company used the tool KneadData for this specific task), I want to use an established metagenomics pipeline to confirm the accuracy of my results. MG-RAST

rx32940 commented 4 years ago

upload all raw data to the web interface inbox through api upload code from local machine to web interface

rx32940 commented 4 years ago

I pair-end joined all the the forward and reverse fastq files for each sample before feeded into the pipeline. project name:pair-end-joined-metagenomes

pipeline options:

dereplication yes
screening(hostclean) M. musculus, NCBI v37
dynamic trimming yes
minimum quality 10
maximum low quality basepairs 5

steps performed in the pipeline:

Screen Shot 2019-09-23 at 4 12 40 PM

example analysis for R22.L found in this link (need login info): https://www.mg-rast.org/mgmain.html?mgpage=overview&metagenome=d65d65c7036d676d343836303339302e33

R22.L genus level taxonomic distribution Screen Shot 2019-09-23 at 4 18 09 PM

rx32940 commented 4 years ago

project name: test-hostcleaned

This step was tested with only one sample R27.K

To do:

conclusion: unclassification portion of the KRAKEN2 (LCA) and CLARK results belongs to Eukaryotes that was not presented in the database.

rx32940 commented 4 years ago

Host is Rat, the reference genome for rat is not available for MG-Rast pipeline. I have found the reference through the UCSC genome browser: Jul. 2014 (RGSC 6.0/rn6) assembly of the rat genome (rn6, RGSC Rnor_6.0) and downloaded through FTP.

bowtie2 screening with Rattus reference genome code

Because this task takes a very long time, I decide to use the two passed screening sequences first to test:

rx32940 commented 4 years ago

R22.S host cleaned data from the company for comparison with host cleaned data with bowtie2 in the pipeline.