sidbdri / cookiecutter-de_analysis_skeleton

Skeleton for new differential expression analysis project.
3 stars 1 forks source link

Automate writing paper Methods section #201

Closed lweasel closed 2 years ago

lweasel commented 2 years ago

For about the millionth time I've just been writing a methods section for a manuscript, e.g. something like:

Samples were sequenced to a depth of approximately 100 million 50-base pair, paired-end reads. The reads were mapped to the primary assembly of the human (hg38) reference genome contained in Ensembl release 106, using the STAR RNA-seq aligner, version 2.7.9a [1]. Tables of per-gene read counts were generated from the mapped reads with featureCounts, version 2.0.2 [2]. Differential gene expression was performed in R using DESeq2, version 1.30.1 [3]. Gene set testing was then performed using Camera [4] from the R package limma, version 3.46.0 [5], using gene sets from the Molecular Signatures Database, version 7.5.1 (https://www.gsea-msigdb.org/gsea/msigdb/).

[1] Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [2] Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014). [3] Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550 (2014). [4] Wu, D. & Smyth, G. K. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 40, e133 (2012). [5] Ritchie et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47 (2015).

and it's really painful every time. I just realised we could produce a text file containing this type of text in cookiecutter, automatically filling in the read-depths, read-lengths, Ensembl and tool version that were associated with that project instance. It doesn't have to be completely the full version that would go into a manuscript, but as a starting point it would save a lot of grief.

lweasel commented 2 years ago

This is an initial text that we could use. Here FILL_THIS_IN indicates something that the paper authors (not us) should fill in, so that can be output verbatim. indicates some information that we can fill in.

"RNA sequencing was performed using FILL_THIS_IN library preparation along with next-generation sequencing on the FILL_THIS_IN platform; sequencing was carried out by FILL_THIS_IN. Samples were sequenced to a depth of approximately <fill - mean reads per sample, rounded to 5 (or 10?) million> million <fill - length of reads in bases>-base pair, <fill - "single-end" or "paired-end"> reads. The reads were mapped to the primary assembly of the <fill - species> (<fill - species genome assembly name, e.g. "mm10", "hg38" etc.>) reference genome contained in Ensembl release <fill - Ensembl version>, using the STAR RNA-seq aligner, version <fill - STAR version> [1]. Tables of per-gene read counts were generated from the mapped reads with featureCounts, version <fill - featureCounts version> [2]. Differential gene expression was performed in R using DESeq2, version <fill - DESeq2 library version> [3]. Gene ontology enrichment analysis was performed using topGO, version <fill - topGO library version> [4]. Gene set testing was then performed using Camera [5] from the R package limma, version <fill - limma library version> [6], using gene sets from the Molecular Signatures Database, version <fill - MSigDb version> (https://www.gsea-msigdb.org/gsea/msigdb/).

[1] Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [2] Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014). [3] Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550 (2014). [4] Alexa, A., Rahnenfuhrer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006). [5] Wu, D. & Smyth, G. K. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 40, e133 (2012). [6] Ritchie et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47 (2015)."