This repository contains scripts, functions, and data I used or created in support of my work, "Mapping gene transcription and neurocognition across human neocortex". Find the preprint here, the postprint here, and the article here. All analyses were run on Matlab version 9.8.0.1359463 (R2020a) Update 1.
Data and scripts are organized into five subfolders. The data that is used in multiple scripts is included in the root folder. This data includes:
The folder PLS contains the script scpt_genes_cog_pls.m which performs partial least squares analysis on gene expression and functional activation matrices. It also contains the script scpt_cca.m which performs canonical correlation analysis on gene expression and functional activation matrices, and compares the results to the PLS results. The significance of the latent variables is assessed against a permutation test that accounts for spatial autocorrelation. The correlation of PLS-derived scores is cross-validated using the function fcn_crossval_pls_brain_obvs.m which assigns nodes on a distance-based method to account for spatial autocorrelation. The terms that contribute most to the first latent variable are extrated. Finally, PLS-derived scores are distributed among three network classifications: the intrinsic (resting-state) networks, the Von Economo cytoarchitectonic classes, and the Mesulam classes of laminar differentiation.
The main PLS code (pls_analysis.m
) can be found here, under "PLSCMD".
The data in the folder is:
The folder GO contains the script scpt_GO.m which performs gene set enrichment analysis based on two PLS-defined gene sets. Analyses were adapted from this repository which also provides two necessary files which can be found here.
The data in the folder is:
The folder CTD contains the script scpt_ctd.m which determines the ratio of genes that are preferentially expressed in seven different cell types. Significance is assessed against a null model of random gene sets. Cell type deconvolution comes from work discribed in this paper, and the data (alongside much more) can also be found at Jakob Seidlitz's repo The folder also contains the function fcn_ctd.m which was written after publication, but is easier to use so I've included it here.
The data in the folder is:
The folder HCP contains the script scpt_hcp.m which uses cortical thickness and T1w/T2w maps from the Human Connectome Project (S1200 release) to relate the PLS-derived gene score pattern to individual differences in behaviour. Original data can be downloaded from here. Note that the script is written for all 1096 subjects with full fMRI runs, but in reality only 417 unrelated subjects were used in analyses. Due to privacy policies, their subject indices are not included. This section got cut during the review process, so the results are only included in the preprint.
The data in the folder is:
The folder BrainSpan contains the script scpt_brainspan.m which replicates results using gene expression estimates from the BrainSpan database. The script also tracks the gene expression-functional activation signature across human development. Many thanks to Jake Vogel for organizing the data for comparability with AHBA.
The data in the folder is: