yanxiting / sarc_pbmc

This project analyzes the PBMC bulk RNAseq data from patients with Sarcoidosis recruited by GRADS study.
0 stars 0 forks source link

data preprocessing and cleaning questions #1

Open yanxiting opened 4 years ago

yanxiting commented 4 years ago

I started with two files to understand your approach: Preprocessing of the GRADS SARC PBMC data and PCA of the GRADS PBMC baseline expression data. I have not yet seen a file that describes your approach to the fastqc quality control checks. One specific question I had is whether you dropped samples based on the fastqc files. my second question is related to the PCA file; were those data generated using the KPKM or raw read counts?

This was asked by @lkoth .

yanxiting commented 4 years ago

Please see my answers below. @lkoth

I started with two files to understand your approach: Preprocessing of the GRADS SARC PBMC data and PCA of the GRADS PBMC baseline expression data. I have not yet seen a file that describes your approach to the fastqc quality control checks. One specific question I had is whether you dropped samples based on the fastqc files.

XY: The TorrentSuite software from the company did data quality assessment on the reads before exporting the fastq files including read trimming and sample quality assessment. So we did not do fastqc again on the data. When we did PCA, we included all the samples so no sample was removed based on fastqc reports.

my second question is related to the PCA file; were those data generated using the KPKM or raw read counts?

XY: The PCA was conducted on FPKM, which should not be done on raw read counts because the raw read counts have effect of gene length and samples sequencing depth, which are all technical factors that's irrelevant to the actual biology. So we only use FPKM for any type of analysis including PCA.

This was asked by @lkoth .