Better PCA analysis - Githubissues

vertesy commented 4 years ago

Motivation

To better understand how similar samples are, PCA is a good start. However, there the current analysis is insufficient:

first 3 PCs often do not separate all identified clusters,
We need to scale each axis in proportion to the variance explained by each PC

Output needed

[ ] Get the variance explained by each PC
[ ] Plot more PC's

vertesy commented 4 years ago

hey hey,

the data imported for the scatter3d can be found in results/deseq2/featureCounts/plot/PCA/

that is the code used


pca.dat <- read.table(files.vst$pca10000pcatsv,header=T,sep="\t",quote="")

if(any(grepl("PC3",colnames(pca.dat)))) {
    cat(paste0("\n\n### VST 10000 {-}\n\n",'* Interactive scatter plot of Samples on PCA1-3 using
 VST expression values of top 10000 most variably expressed genes',"\n\n"))

    if(any(grepl("condition",colnames(pca.dat)))){
        plot_ly(pca.dat,x=~PC1, y=~PC2, z=~PC3, type="scatter3d", mode="markers", text = ~SampleName, color=~condition)
    }else{
        plot_ly(pca.dat,x=~PC1, y=~PC2, z=~PC3, type="scatter3d", mode="markers", text= ~SampleName)
    }
}

it can be found in /groups/bioinfo/shared/public/pipeline/ii-rnaseq/0.4dev/bin/Rmd/deseq2_qc.Rmd

An example

cd /Volumes/abel-1/Data/pseudobulk/iiRNAseq_ii.GRCh38_20191120162110/results/deseq2/featureCounts/plot/PCA/

find -name "*vst.tsv"

vertesy commented 4 years ago

Replot them by plotly

require(plotly)

vertesy / pseudoBulk

Better PCA analysis #6

Motivation

Output needed

An example

Replot them by plotly