hagenaue / FrontalPole_Microarray

Reanalysis of public frontal pole (BA10) microarray data to accompany our qPCR project examining neurotransmission-related gene expression in the frontal pole in relationship to Bipolar Disorder and Schizophrenia.
0 stars 0 forks source link

Write-up/Organize results for Diagnosis vs. Important Co-variates #2

Closed hagenaue closed 3 years ago

hagenaue commented 3 years ago

Write up a brief description of which important co-variates may be confounding variables (pH, RNA degradation, PMI, Age, Gender, RIN, batch/scan date) in the Maycox (and eventually Iwamoto) dataset, with figures/tables to illustrate the most interesting relationships and statistical summary.

hagenaue commented 3 years ago

For background on the statistical output (Regression, Chi-Square): https://onlinestatbook.com/2/index.html

hagenaue commented 3 years ago

Guidelines for writing up statistics: https://www.statisticshowto.com/probability-and-statistics/reporting-statistics-apa-style/

Regression: B=Regression Slope, p=p-value.

hagenaue commented 3 years ago

Making prettier covariate vs. diagnosis plots (for numeric co-variates: age, pH, PMI, RNAdegradation, RIN):

pdf("PutPlotnamehere.pdf", width=7, height=7)#size can be messed around with - the smaller the better! but legible

boxplot(NumericCovariate~GroupingVariable_Factor, ylab="NameOfNumericCovariate", col=c("darkorange2", "burlywood1", "red", "pink")) #number of colors is number of groups or skip altogether

stripchart(NumericCovariate~GroupingVariable_Factor, vertical = TRUE, method = "jitter", add = TRUE, pch = c(20, 1, 17, 2), cex=2, cex.axis=10, cex.lab=10, col = 'black') #Adds jittered datapoint to box plot

dev.off()

hagenaue commented 3 years ago

For categorical co-variates vs. diagnosis - make tables. Can be done by outputting a .csv and working in Excel ... or done in R.

hagenaue commented 3 years ago

Make a meta-figure with subpanels for each diagnosis*covariate relationship (like Ohayon).

hagenaue commented 3 years ago

... and then write a figure legend with the APA formatted stats.

hagenaue commented 3 years ago

Re-output statistics for subject variable vs. subject variable for the Maycox dataset after removing the outlier samples (So extreme it drives PC3: GSM439786.CEL.gz, Missing pH Data and has most extreme PC1: GSM439795.CEL.gz).

From sample code: Lns 455-479 define "bad samples", remove them from the gene expression data and metadata (and derived factors, etc). This code overwrites other objects in your workspace, so make sure to save your workspace under a new name (e.g. wOutliersRemoved) and then change the output directory for your output so you don't overwrite it.

Since it reuses the same objects, you should be able to just re-run all of the subject variable vs. subject variable and PCA vs. subject variable code.