Open rx32940 opened 5 years ago
upload all raw data to the web interface inbox through api upload code from local machine to web interface
I pair-end joined all the the forward and reverse fastq files for each sample before feeded into the pipeline. project name:pair-end-joined-metagenomes
pipeline options:
dereplication | yes |
---|---|
screening(hostclean) | M. musculus, NCBI v37 |
dynamic trimming | yes |
minimum quality | 10 |
maximum low quality basepairs | 5 |
steps performed in the pipeline:
example analysis for R22.L found in this link (need login info): https://www.mg-rast.org/mgmain.html?mgpage=overview&metagenome=d65d65c7036d676d343836303339302e33
R22.L genus level taxonomic distribution
KRONA result
The unclassfied section from CLARK and KRAKEN2 results are Eukaryote sequences. (unclassified because Eukaryote sequences were not in their database)
because Mus genome was screened in one of the pipeline steps, so the genome should be already removed. Why we can still receive mus hits for taxonomic profiling?
next step:
with representative hit, the absolute abundance of each sample
with representative hit, the relative abundance of each sample
in liver sample from subject R22 and R27. Bacteria abundance is significantly higher. This corresponds to the results provide by the company. However, R26.L does not have high abundance in Bacteria Domain. This is inconsistent with the company's result.
project name: test-hostcleaned
This step was tested with only one sample R27.K
To do:
conclusion: unclassification portion of the KRAKEN2 (LCA) and CLARK results belongs to Eukaryotes that was not presented in the database.
Host is Rat, the reference genome for rat is not available for MG-Rast pipeline. I have found the reference through the UCSC genome browser: Jul. 2014 (RGSC 6.0/rn6) assembly of the rat genome (rn6, RGSC Rnor_6.0) and downloaded through FTP.
bowtie2 screening with Rattus reference genome code
Because this task takes a very long time, I decide to use the two passed screening sequences first to test:
R22.S host cleaned data from the company for comparison with host cleaned data with bowtie2 in the pipeline.
This is a complete pipeline for metagenomic analysis. The purpose of trying out this pipeline is due to the discrepancies between my Kraken2/Clark results and the company's result. Due to the fact I skipped the QC and host clean step for Kraken2/Clark analyses (which I believe the company used the tool KneadData for this specific task), I want to use an established metagenomics pipeline to confirm the accuracy of my results. MG-RAST