sr320 / ceabigr

Workshop on genomic data integration with a emphasis on epigenetic data (FHL 2022)
4 stars 2 forks source link

Conduct spurious transcription analysis #85

Closed yaaminiv closed 4 months ago

yaaminiv commented 8 months ago

Analysis to understand "spuriousness" with respect to exon use suggested by Sam Bogan. Could we replicate this with our transcription data for each sex independently to look at treatment effects on exon use/spurious transcription?


Here's a GitHub knit of the code I used for measuring changes in exon use consistent with spuriousness (under the subheader "Determine which genes are..."). An .rmd file of the same name is also in this repo.

As an example of what these exon use patterns can look like, here's a plot of differential exon use induced by developmental or maternal stress across exon number. For some reason we saw that genes in the ATP synthase complex were enriched among transcripts exhibiting spurious transcription induced by stress. We don't know why, but that's why the plot says ATP synthase on it.

image

yaaminiv commented 8 months ago

associated paper: https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-023-01645-8

yaaminiv commented 7 months ago

another option is to look at relative expression of exons: https://doi.org/10.1126/sciadv.aat2142

kubu4 commented 7 months ago

From Li et al:

Spurious transcription analysis

Trimmed reads were mapped to the Aiptasia genome using HISAT2 v2.1.0, and mapping coverage per position was extracted using BEDTools v2.17.0. To assess spurious transcription levels, we determined the coverage per exon normalized across all six replicates (assuming every replicate had a coverage of 1 million sequences in total) and then calculated the average coverage ratios of exons 2 to 6 versus exon 1 for every gene.

@kubu4 Will get BAM files and link here.

kubu4 commented 7 months ago

@sr320 - Individual BAMS and their corresponding index files are in each individual sample subirectory:

https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/

A merged BAM of all samples is here (79GB):

https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam

Merged BAM index file (23MB):

https://gannet.fish.washington.edu/Atumefaciens/20230821-cvir-stringtie-GCF_002022765.2-isoforms/20230821_cvir_stringtie_GCF_002022765-sorted-bams-merged.bam.bai

yaaminiv commented 7 months ago

From Li et al.:

To assess the level of spurious transcripts in genes of different methylation levels, we calculated the expression level for each exon and computed the natural log fold change of expression relative to exon1. Our rationale was that spurious transcripts that start somewhere within the gene body contribute more to observed increases in expression of more distal exons but less so to more proximal exons. Consequently, higher levels of spurious transcription would result in higher ratios of expression for more distal exons.

yaaminiv commented 7 months ago

From Li et al.:

Trimmed reads were mapped to the Aiptasia genome using HISAT2 v2.1.0, and mapping coverage per position was extracted using BEDTools v2.17.0. To assess spurious transcription levels, we deter- mined the coverage per exon normalized across all six replicates (assuming every replicate had a coverage of 1 million sequences in total) and then calculated the average coverage ratios of exons 2 to 6 versus exon 1 for every gene

yaaminiv commented 5 months ago

Next step: create summary plot for all genes in all samples, but breakdown by methylation level:

high (> 50%) vs. moderate (10-50%) vs. low (< 10%) methylation