nf-core / crisprseq

A pipeline for the analysis of CRISPR edited data. It allows the evaluation of the quality of gene editing experiments using targeted next generation sequencing (NGS) data (`targeted`) as well as the discovery of important genes from knock-out or activation CRISPR-Cas9 screens using CRISPR pooled DNA (`screening`).
https://nf-co.re/crisprseq
MIT License
25 stars 25 forks source link

Suggested addition: a PCA illustrating sample/condition/replicates #166

Open jeremymsimon opened 3 months ago

jeremymsimon commented 3 months ago

Description of feature

Maybe this isn't the typical case for most users, but my experiment has multiple replicates of treatment and control. If I wanted to check how concordant my replicates were, and how separated my treatment from control conditions were, I'd likely attempt a PCA based on the raw or normalized counts produced by MAGeCK, coloring the points both by sample and condition (similar to DESeq2 output in nf-core/rnaseq). This could also help identify whether any sample(s) were outliers compared to others, which would be supported by the countsummary table--ie perhaps one sample was an outlier from others in PC space and also has a lower-than-usual mapping rate or very high Gini index etc.

I'd also plot the cross-sample pairwise correlations of normalized counts as a (n * n) heatmap, which likely also would be a useful output