ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
124 stars 30 forks source link

Cell Reports 文献结果复现 #5570

Closed ixxmu closed 2 months ago

ixxmu commented 2 months ago

https://mp.weixin.qq.com/s/YgWr5yxxcJurmcZ0gkgSBA

ixxmu commented 2 months ago

Cell Reports 文献结果复现 by 老俊俊的生信笔记

引言

摘要: 这项研究调查了泛素结合酶 Rad6 在调节氧化应激期间翻译的作用。研究人员使用核糖体测序(Ribo-seq)和 RNA 测序来表征野生型(WT)和 rad6Δ 酵母细胞在氧化应激条件下的翻译景观。他们发现,氧化应激会导致核糖体在特定的氨基酸序列上暂停,特别是那些含有 异亮氨酸-脯氨酸(XIP)序列的序列, 这种情况需要 Rad6 的参与。这种 "氧化还原暂停" 特征在缺乏 Rad6 的情况下基本消失,表明 Rad6 介导的 泛素化 是这种翻译调节所需的。进一步分析表明,Rad6 影响翻译的方式独立于核糖体相关质量控制(RQC)通路,并导致整合应激反应(ISR)通路的激活发生变化。总的来说,这项研究揭示了 Rad6 如何在应对氧化应激时重塑翻译景观的关键机制。

机制图:

我们尝试分析一下这篇文章的数据。

安装

# install.packages("devtools")
devtools::install_github("junjunlab/RiboProfiler")

# or
remotes::install_github("junjunlab/RiboProfiler")

library(RiboProfiler)

下载数据

直接 ascp 下载,下载后修改一下文件名称,sra-exploer 下载一下放后台就行了:

#!/usr/bin/env bash
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR236/023/SRR23617123/SRR23617123.fastq.gz . && mv SRR23617123.fastq.gz SRR23617123_GSM7062915_SUB280_WT_rep1_SM061F_Saccharomyces_cerevisiae_OTHER.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR236/021/SRR23617121/SRR23617121.fastq.gz . && mv SRR23617121.fastq.gz SRR23617121_GSM7062917_SUB280_WT_peroxide_rep1_SM062F_Saccharomyces_cerevisiae_OTHER.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR236/020/SRR23617120/SRR23617120.fastq.gz . && mv SRR23617120.fastq.gz SRR23617120_GSM7062918_SUB280_WT_peroxide_rep2_SM072F_Saccharomyces_cerevisiae_OTHER.fastq.gz
...

去接头

文章上游数据处理部分:

他这里是含有 UMI 序列的,文章没有给代码,找了该作者以前发的文章才了解测序结构:

下面是文章里提供的信息:

注意作者上传 GEO 里面是这样描写的,注意红框内容:

接着就可以去接头了:

for i in SUB280_WT_rep1_F SUB280_WT_rep2_F SUB280_WT_peroxide_rep1_F SUB280_WT_peroxide_rep2_F SUB280_rad6_rep1_F SUB280_rad6_rep2_F SUB280_rad6_peroxide_rep1_F SUB280_rad6_peroxide_rep2_F SUB280_hel2_F SUB280_hel2_peroxide_F SUB280_rad6_RAD6_F SUB280_rad6_RAD6_peroxide_F SUB280_rad6_RAD6_C88A_F SUB280_rad6_RAD6_C88A_peroxide_F S288C_WT_F S288C_WT_peroxide_F S288C_rad6_F S288C_rad6_peroxide_F SUB280_WT_rep1_Fd SUB280_WT_rep2_Fd SUB280_WT_peroxide_rep1_Fd SUB280_WT_peroxide_rep2_Fd SUB280_rad6_Fd SUB280_rad6_peroxide_Fd
do
        cutadapt -j 20 -u 2 -u -5 -o ./${i}_trim.fq.gz ../1.raw-data/${i}.fastq.gz
done

去除 rRNA

去 NCBI 等相关网站下载酵母的 rRNA 序列,然后用 bowtie2 建索引,然后就可以去除了:

# trim rRNA
for i in SUB280_WT_rep1_F SUB280_WT_rep2_F SUB280_WT_peroxide_rep1_F SUB280_WT_peroxide_rep2_F SUB280_rad6_rep1_F SUB280_rad6_rep2_F SUB280_rad6_peroxide_rep1_F SUB280_rad6_peroxide_rep2_F SUB280_hel2_F SUB280_hel2_peroxide_F SUB280_rad6_RAD6_F SUB280_rad6_RAD6_peroxide_F SUB280_rad6_RAD6_C88A_F SUB280_rad6_RAD6_C88A_peroxide_F S288C_WT_F S288C_WT_peroxide_F S288C_rad6_F S288C_rad6_peroxide_F SUB280_WT_rep1_Fd SUB280_WT_rep2_Fd SUB280_WT_peroxide_rep1_Fd SUB280_WT_peroxide_rep2_Fd SUB280_rad6_Fd SUB280_rad6_peroxide_Fd
do
        bowtie2 -p 20 -x ../../index-data/sac-rRNA-index/Saccharomyces-cerevisiae-rRNA \
                --un-gz ${i}.rmrRNA.fq.gz \
                -U ../2.trim-data/${i}_trim.fq.gz \
                -S ./null
done

比对到基因组

ensembl 网站下载酵母的基因组和注释文件建立索引:

STAR --runThreadN 15 \
     --runMode genomeGenerate \
     --genomeDir sac-star-genome-index/ \
     --genomeFastaFiles ./Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa \
     --sjdbGTFfile Saccharomyces_cerevisiae.R64-1-1.112.gtf

你也可以去专门酵母的网站( https://www.yeastgenome.org/ ):

选择 download

比对:

# map to genome
for i in SUB280_WT_rep1_F SUB280_WT_rep2_F SUB280_WT_peroxide_rep1_F SUB280_WT_peroxide_rep2_F SUB280_rad6_rep1_F SUB280_rad6_rep2_F SUB280_rad6_peroxide_rep1_F SUB280_rad6_peroxide_rep2_F SUB280_hel2_F SUB280_hel2_peroxide_F SUB280_rad6_RAD6_F SUB280_rad6_RAD6_peroxide_F SUB280_rad6_RAD6_C88A_F SUB280_rad6_RAD6_C88A_peroxide_F S288C_WT_F S288C_WT_peroxide_F S288C_rad6_F S288C_rad6_peroxide_F SUB280_WT_rep1_Fd SUB280_WT_rep2_Fd SUB280_WT_peroxide_rep1_Fd SUB280_WT_peroxide_rep2_Fd SUB280_rad6_Fd SUB280_rad6_peroxide_Fd
do
        STAR --runThreadN 20 \
         --outFilterType Normal \
         --alignEndsType EndToEnd \
         --outFilterMismatchNmax 1 \
         --outFilterMultimapNmax 1 \
         --genomeDir ../../index-data/sac-star-genome-index \
         --readFilesCommand gunzip -c \
         --readFilesIn ../3.rmrRNA-data/${i}.rmrRNA.fq.gz \
         --outFileNamePrefix ./${i} \
         --outSAMtype BAM SortedByCoordinate \
         --quantMode GeneCounts \
         --outSAMattributes All
done

riboseq 质量控制分析

和我们之前介绍的 Ribo-SEQ 数据质量评估 一样,好的数据质量才好往下进行分析。整体上看,这篇文章的测序质量还是非常不错的,非常符合 ribo-seq 的特征指标。

准备注释文件

酵母UTR很少,对 CDS 两端延长 50nt 的长度作为 UTR

# devtools::install_github("junjunlab/RiboProfiler",force = T)

library(RiboProfiler)

# ==============================================================================
# 1_prepare gene annotation file
# ==============================================================================
pre_longest_trans_info(gtf_file = "../../index-data/Saccharomyces_cerevisiae.R64-1-1.112.gtf",
                       out_file = "longest_info.txt")

# extend CDS
df_extend <- gene_anno_extend(longest_trans_file = "longest_info.txt",
                              upstream_extend = 50,
                              downstream_extend = 50,
                              output_file = "longest_info_extend.txt")

获得质控数据

这里我们保持和文章一致,使用 read 的 3‘end 来进行分析,因为文章测了 monosome(单核糖体)disome(二核糖体) 两种测序,我们把数据分开来保存一下:

# ==============================================================================
# 2_prepare Ribo QC data
# ==============================================================================
bam_file = list.files("../4.map-data/",pattern = ".bam$",full.names = TRUE)
bam_file

sample_name <- sapply(strsplit(list.files("../4.map-data/",pattern = ".bam$"),
                               split = "Aligned.sortedByCoord.out.bam"), "[",1)
sample_name
# [1] "S288C_rad6_F"                     "S288C_rad6_peroxide_F"
# [3] "S288C_WT_F"                       "S288C_WT_peroxide_F"
# [5] "SUB280_hel2_F"                    "SUB280_hel2_peroxide_F"
...

# error files
# 16 "SUB280_rad6_rep2_F" 21 "SUB280_WT_rep1_F"

# perform QC analysis
pre_qc_data(longest_trans_file = "longest_info_extend.txt",
            bam_file = bam_file,
            out_file = paste(sample_name,".qc.txt",sep = ""),
            mapping_type = "genome",
            assignType = "end3",
            seq_type = "singleEnd")

# load qc data
qc_df <- load_qc_data()

qc_df$length <- factor(qc_df$length)

# filter monosome and disome
disome <- qc_df %>%
  dplyr::filter(endsWith(sample,"_Fd"))

# save data
save(disome,file = "disome.rda")

monosome <- qc_df %>%
  dplyr::filter(!endsWith(sample,"_Fd"))

# save data
save(monosome,file = "monosome.rda")

质控指标

monosomedisome 的我们都看看:

# length distribution 8x20
qc_plot(qc_data = monosome,type = "length",
        facet_wrap_list = list(ncol = 6))
# 5x12
qc_plot(qc_data = disome,type = "length",
        facet_wrap_list = list(ncol = 3))
# length with frame 8x20
qc_plot(qc_data = monosome,
        type = "length_frame",
        facet_wrap_list = list(ncol = 6))
# 5x12
qc_plot(qc_data = disome,
        type = "length_frame",
        facet_wrap_list = list(ncol = 3))
# region features 10x20
qc_plot(qc_data = monosome,
        type = "feature",
        facet_wrap_list = list(ncol = 6))
# 5x20
qc_plot(qc_data = disome,
        type = "feature",
        facet_wrap_list = list(ncol = 6))
# relative to start/stop site 8x20
rel_to_start_stop(qc_data = monosome,
                  type = "relst",
                  facet_wrap_list = list(ncol = 6,scales = "free"))
# 5x12
rel_to_start_stop(qc_data = disome,
                  type = "relst",
                  dist_range = c(-50,250),
                  facet_wrap_list = list(ncol = 3,scales = "free"))
# 8x20
rel_to_start_stop(qc_data = monosome,
                  type = "relsp",
                  facet_wrap_list = list(ncol = 6,scales = "free"))
# 5x12
rel_to_start_stop(qc_data = disome,
                  type = "relsp",
                  dist_range = c(-250,50),
                  facet_wrap_list = list(ncol = 3,scales = "free"))

offset 检测

看看不同长度片段的 offset 情况:

# ==============================================================================
# offset check
# ==============================================================================
# offset check
offset_momo <- PsiteOffsetCheck(qc_data = monosome,
                                relative_distance = c(-25,25),
                                read_length = c(25,34))
offset_momo$plot
offset_momo$summary_offset
#                               sample                                readLengths                                    Offsets
# 1                       S288C_rad6_F     25, 26, 27, 28, 29, 30, 31, 32, 33, 34     15, 15, 15, 15, 15, 15, 15, 21, 23, 25
# 11             S288C_rad6_peroxide_F     25, 26, 27, 28, 29, 30, 31, 32, 33, 34     15, 15, 15, 15, 15, 15, 25, 17, 23, 25
# 21                        S288C_WT_F     25, 26, 27, 28, 29, 30, 31, 32, 33, 34     15, 15, 15, 15, 15, 15, 25, 21, 23, 25
offset_disome <- PsiteOffsetCheck(qc_data = disome,
                                  relative_distance = c(-10,70),
                                  read_length = c(57,63))
offset_disome$plot

计算校正和标准化的数据

根据文章说明,对特定长度的 reads 保留分析,使用 15ntoffset 来校正:

# ==============================================================================
offset_momo <- PsiteOffsetCheck(qc_data = monosome,
                                relative_distance = c(-25,25),
                                read_length = c(25,34))

offset_momo_shift <- offset_momo$summary_offset
offset_momo_shift$Offsets <- c("15,15,15,15,15,15,15,15,15,15")

# adjust offset
adjusted_offset_mono <- adjust_offset(offset_df = offset_momo_shift,
                                      qc_df = monosome)

# get normalized data
mono_norm_df <- get_nomalized_counts(longest_trans_file = "longest_info_extend.txt",
                                     qc_df = adjusted_offset_mono,
                                     exclude_upstreamCodon = 5,
                                     exclude_downstreamCodon = 5)

# save data
save(mono_norm_df,file = "mono_norm_df.rda")

# ==============================================================================
# ajust offset with reads for disosome
load("disome.rda")

offset_disome <- PsiteOffsetCheck(qc_data = disome,
                                  relative_distance = c(-10,70),
                                  read_length = c(57,63))

offset_disome_shift <- offset_disome$summary_offset
offset_disome_shift$Offsets <- c("15,15,15,15,15,15,15")

# adjust offset
adjusted_offset_diso <- adjust_offset(offset_df = offset_disome_shift,
                                      qc_df = disome)


# get normalized data
diso_norm_df <- get_nomalized_counts(longest_trans_file = "longest_info_extend.txt",
                                     qc_df = adjusted_offset_diso,
                                     exclude_upstreamCodon = 5,
                                     exclude_downstreamCodon = 5)

# save data
save(diso_norm_df,file = "diso_norm_df.rda")

metagene plot

作者发现缺失 RAD6 以后在氧化应激的条件下并不明显影响核糖体在UTR 区域的堆积。文章里的图可以看到 reads 应该是没有 shift 的,所以应该用没有 shift 的数据来分析。

Because oxidative stress induced by peroxide was previouslyassociated with increased translation of 5' and 3' untranslatedregions (UTRs),7,13 we tested whether Rad6 impacts translationoutside of main open reading frames. Metagene analysis, performed by averaging data from genes aligned by their start orstop codons, showed modest changes in the occupancy of 5’and 3‘ UTRs with peroxide treatment in WT cells, as expected(Figures S1F–S1G). The absence of Rad6 did not affect thesetrends (Figures S1F–S1G), suggesting that Rad6 does notstrongly impact translation of UTRs under oxidative stress

此外计算 ribosome occupancy 的时候需要排除前后 15nt 的影响,所以参数 exclude_upstreamCodon 和 exclude_downstreamCodon 设置为 3:

计算:

qc_df <- load_qc_data()

# filter monosome and disome
mono <- qc_df %>%
  dplyr::filter(sample %in% c("SUB280_WT_rep1_F","SUB280_WT_rep2_F",
                              "SUB280_WT_peroxide_rep1_F","SUB280_WT_peroxide_rep2_F",
                              "SUB280_rad6_rep1_F","SUB280_rad6_rep2_F",
                              "SUB280_rad6_peroxide_rep1_F","SUB280_rad6_peroxide_rep2_F")) %>%
  dplyr::filter(length %in25:34)

# get normalized data
mono_norm_df <- get_nomalized_counts(longest_trans_file = "longest_info_extend.txt",
                                     qc_df = mono,
                                     exclude_upstreamCodon = 3,
                                     exclude_downstreamCodon = 3)

# add group name
mono_norm_df <- mono_norm_df %>%
  dplyr::mutate(group = dplyr::case_when(sample %in% c("SUB280_WT_rep1_F","SUB280_WT_rep2_F") ~ "WT",
                                        sample %in% c("SUB280_WT_peroxide_rep1_F","SUB280_WT_peroxide_rep2_F") ~ "WT_peroxide",
                                        sample %in% c("SUB280_rad6_rep1_F","SUB280_rad6_rep2_F") ~ "Δrad6",
                                        sample %in% c("SUB280_rad6_peroxide_rep1_F","SUB280_rad6_peroxide_rep2_F") ~ "Δrad6_peroxide"))

save(mono_norm_df,file = "mono_norm_df_noshift.rda")
load("mono_norm_df_noshift.rda")

relative to start codon

# plot WT WT_peroxide 3x6
metagene_plot(longest_trans_file = "longest_info_extend.txt",
              normed_file = mono_norm_df %>%
                dplyr::filter(group %in% c("WT","WT_peroxide")),
              min_counts = 64,
              min_cds_length = 600,
              relative_distance = c(-20,80),
              collapse = T,
              mode = "nt") +
  ggplot2::scale_color_manual(values = c("WT" = "black","WT_peroxide" = "#0099CC"))
# plot Δrad6 Δrad6_peroxide 3x6
metagene_plot(longest_trans_file = "longest_info_extend.txt",
              normed_file = mono_norm_df %>%
                dplyr::filter(group %in% c("Δrad6","Δrad6_peroxide")),
              min_counts = 64,
              min_cds_length = 600,
              relative_distance = c(-20,80),
              collapse = T,
              mode = "nt") +
  ggplot2::scale_color_manual(values = c("Δrad6" = "black","Δrad6_peroxide" = "#993399"))

relative to stop codon

metagene_plot(longest_trans_file = "longest_info_extend.txt",
              normed_file = mono_norm_df %>%
                dplyr::filter(group %in% c("WT","WT_peroxide")),
              min_counts = 64,
              min_cds_length = 600,
              type = "sp",
              relative_distance = c(-60,40),
              collapse = T,
              mode = "nt") +
  ggplot2::scale_color_manual(values = c("WT" = "black","WT_peroxide" = "#0099CC")) +
  ggplot2::ylim(0,15)
metagene_plot(longest_trans_file = "longest_info_extend.txt",
              normed_file = mono_norm_df %>%
                dplyr::filter(group %in% c("Δrad6","Δrad6_peroxide")),
              min_counts = 64,
              min_cds_length = 600,
              type = "sp",
              relative_distance = c(-60,40),
              collapse = T,
              mode = "nt") +
  ggplot2::scale_color_manual(values = c("Δrad6" = "black","Δrad6_peroxide" = "#993399")) +
  ggplot2::ylim(0,15)

gene track plot

作者发现在 RAD6 缺失后,氧化应激下在某些特定三肽基序 XIP 位点的核糖体停滞现象发生了丢失,文章举例了 SKI2 和 PCF11 两个基因,图里标注了暂停的基序:

track plot:

load("mono_norm_df.rda")

# filter samples
track_data <- mono_norm_df %>%
  dplyr::filter(sample %in% c("SUB280_WT_peroxide_rep1_F","SUB280_WT_peroxide_rep2_F",
                              "SUB280_rad6_peroxide_rep1_F","SUB280_rad6_peroxide_rep2_F"))

track_df <- get_track_df(longest_trans_file = "longest_info_extend.txt",
                         select_gene = c("SKI2","PCF11"),
                         normed_file = track_data)

# 5x10
track_plot(signal_data = track_df,
           show_ribo_only = T,
           gene_anno = "longest_info_extend.txt")

我们可以检查一下最高的位置处的基序:

tmp <- track_df %>% dplyr::filter(gene_name == "PCF11")

最高的位置处为 1140 位置,减去 50nt 5’UTR,然后 除以 3 等于 364 处的 codon然后加上末尾 8 个空字符,最终是 372,可以看到刚好是 KIPI 基序处:

tri amino acid pause score

这里我尝试使用之前的方式来计算 计算 Ribosome 距离二肽/三肽基序距离的密度 发现结果和文章相差还挺大的,仔细看了文章,发现算的方法不一样,只能自己按照文章的思路计算看看了。思路大概是先计算每个位置处的 rpm,然后计算这个位置上下游 50nt 左右的 average reads,这个位置处相邻的 EPA 位置所代表的基序就是 pause score。然后筛选次数大于 100 次的来分析,去除噪音。

  • (D) Schematics for calculation of pause score, computed by dividing reads at a motif by the average reads in a window around the motif of interest (±50 nt). We calculated pause score at tri-amino acid motifs that map to the ribosomal E (penultimate position of the nascent peptide), P, and A sites.

这也是文章的第一张主图,通过 Ribo-seq 测序发现缺失 RAD6 后,在某些 XIP 基序氧化应激条件下的核糖体停滞现象消失或者明显减弱了。


大概是这样的:

###########################################################################################
# 4_get motif pause score in window
###########################################################################################
pause_score = {}
with open(input_file,'r'as input:
    for line in input:
        _,_,_,_,_,_,trans_pos,trans_id,counts,_,_,exp = line.split()
        utr5,cds,_ = gene_info[trans_id]
        cds_pos = int(trans_pos) - utr5

        # exclude first and last 50nt
        if cds_pos >= window + 1 and cds_pos <= cds - window:
            nt_exp = exp_whole_trans[trans_id]
            window_exp = nt_exp[(cds_pos - window - 1):(cds_pos + window)]

            # normalize average reads in window
            average_mean = sum(window_exp)/(window + window + 1)
            normalized_window_exp = [i/average_mean for i in window_exp]

            # get EPA tri amino acid score
            average_sum_epa = sum(normalized_window_exp[(window - 3 - 1):(window + 5)])

            # calculate codon position and get tri-amino acid motif
            if cds_pos%3 == 0:
                codon_pos = int(cds_pos/3)
            else:
                codon_pos = cds_pos//3 + 1

            motif = amino.fetch(trans_id,(codon_pos - 1,codon_pos + 1))

            # save in dict
            if motif in pause_score:
                tmp_exp,tmp_count = pause_score[motif]
                pause_score[motif] = [average_sum_epa + tmp_exp,tmp_count + 1]
            else:
                pause_score[motif] = [average_sum_epa,1]

准备氨基酸序列:

# ==============================================================================
# get sequence
# ==============================================================================
library(RiboProfiler)

# extract amino acids sequnce
FetchSequence(gene_file = "longest_info.txt",
              genome_file = "Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa",
              output_file = "sac_amino_acid.fa",
              type = "cds",
              coding_type = "AA")
# extract cds sequnce
FetchSequence(gene_file = "longest_info.txt",
              genome_file = "Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa",
              output_file = "sac_cds.fa",
              type = "cds",
              coding_type = "NT")

然后我们使用 peptide_motif_score2 函数来计算:

load("mono_norm_df.rda")

unique(mono_norm_df$sample)

# filter samples
ft_data <- mono_norm_df %>%
  dplyr::filter(sample %in% c("SUB280_WT_rep1_F","SUB280_WT_rep2_F",
                              "SUB280_WT_peroxide_rep1_F","SUB280_WT_peroxide_rep2_F",
                              "SUB280_rad6_rep1_F","SUB280_rad6_rep2_F",
                              "SUB280_rad6_peroxide_rep1_F","SUB280_rad6_peroxide_rep2_F",
                              "SUB280_rad6_RAD6_F","SUB280_rad6_RAD6_peroxide_F",
                              "SUB280_rad6_RAD6_C88A_F","SUB280_rad6_RAD6_C88A_peroxide_F",
                              "S288C_WT_F","S288C_WT_peroxide_F",
                              "S288C_rad6_F","S288C_rad6_peroxide_F"))

# calculate pause score for motif
peptide_motif_score2(amino_file = "sac_amino_acid.fa",
                     normed_exp_file = ft_data,
                     longest_trans_file = "longest_info_extend.txt",
                     norm_type = "normed_count",
                     window = 50,
                     occurrence_threshold = 100)
# S288C_rad6_F has been processed!
# S288C_rad6_peroxide_F has been processed!
...

然后绘图可视化(感觉没有文章里的那么明显和点那么分散):

# plot
library(patchwork)

p1 <-
triAmino_scater_plot(occupancy_file = c("normalized_count_data/SUB280_WT_rep1_F_tripeptide_occupancy.txt",
                                        "normalized_count_data/SUB280_WT_rep2_F_tripeptide_occupancy.txt",
                                        "normalized_count_data/SUB280_WT_peroxide_rep1_F_tripeptide_occupancy.txt",
                                        "normalized_count_data/SUB280_WT_peroxide_rep2_F_tripeptide_occupancy.txt"),
                     sample_name = c("WT-rep1","WT-rep2","WT-peroxide-rep1","WT-peroxide-rep2"),
                     group_name = c("WT","WT","WT + peroxide","WT + peroxide"),
                     mark_motif = c("KIP","MIP","QIP","KIH","EIH",
                                    "RKK","PPD","PPE"),
                     pcol = "#0099CC",
                     mark_color = "black")


p2 <-
triAmino_scater_plot(occupancy_file = c("normalized_count_data/SUB280_rad6_rep1_F_tripeptide_occupancy.txt",
                                        "normalized_count_data/SUB280_rad6_rep2_F_tripeptide_occupancy.txt",
                                        "normalized_count_data/SUB280_rad6_peroxide_rep1_F_tripeptide_occupancy.txt",
                                        "normalized_count_data/SUB280_rad6_peroxide_rep2_F_tripeptide_occupancy.txt"),
                     sample_name = c("rad6-rep1","rad6-rep2","rad6-peroxide-rep1","rad6-peroxide-rep2"),
                     group_name = c("rad6","rad6","rad6 + peroxide","rad6 + peroxide"),
                     mark_motif = c("KIP","MIP","QIP","KIH","EIH",
                                    "RKK","PPD","PPE"),
                     pcol = "#993399",
                     mark_color = "black")

# 4x8
p1 | p2

回补型

RAD6 是泛素连接酶,为了验证氧化应激下的核糖体暂停是否与它的泛素连接功能相关,作者对 RAD6 泛素功能区域进行了突变,发现在缺失的条件下,只有回补野生型的 RAD6 而非突变型 才能恢复氧化应激导致的核糖体停滞现象。作者还在不同菌种(S288C)同样发现了类似的现象,表面现象的普适性。

p1 <-
  triAmino_scater_plot(occupancy_file = c("normalized_count_data/SUB280_rad6_RAD6_F_tripeptide_occupancy.txt",
                                          "normalized_count_data/SUB280_rad6_RAD6_peroxide_F_tripeptide_occupancy.txt"),
                       sample_name = c("rad6_RAD6","rad6_RAD6_peroxide"),
                       mark_motif = c("KIP"),
                       pcol = "#003366",
                       mark_color = "black")


p2 <-
  triAmino_scater_plot(occupancy_file = c("normalized_count_data/SUB280_rad6_RAD6_C88A_F_tripeptide_occupancy.txt",
                                          "normalized_count_data/SUB280_rad6_RAD6_C88A_peroxide_F_tripeptide_occupancy.txt"),
                       sample_name = c("rad6_RAD6_C88A","rad6_RAD6_C88A_peroxide"),
                       mark_motif = c("KIP"),
                       pcol = "#993399",
                       mark_color = "black")

# 4x8
p1 | p2

D 图在 S288C 里标记了 Pro 氨基酸结尾的基序(即 A 位点)橙色点,自定义一下绘图函数:

triAmino_scater_plot2 <- function(occupancy_file = NULL,
                                 sample_name = NULL,
                                 group_name = NULL,
                                 mark_motif = NULL,
                                 pcol = "grey",
                                 mark_color = "orange"){
  ...
  df_plot_pro <- df_plot_wide %>%
    dplyr::filter(endsWith(motif,"P"))

  # plot
  ggplot(df_plot_wide) +
    ...
    geom_point(data = df_plot_pro,aes(x = .data[[var[1]]],y = .data[[var[2]]]),
               color = "orange") +
    ...

}

p1 <-
  triAmino_scater_plot2(occupancy_file = c("normalized_count_data/S288C_WT_F_tripeptide_occupancy.txt",
                                          "normalized_count_data/S288C_WT_peroxide_F_tripeptide_occupancy.txt"),
                       sample_name = c("S288C_WT","S288C_WT_peroxide"),
                       mark_motif = c("KIP"),
                       pcol = "#3399CC",
                       mark_color = "black")


p2 <-
  triAmino_scater_plot2(occupancy_file = c("normalized_count_data/S288C_rad6_F_tripeptide_occupancy.txt",
                                          "normalized_count_data/S288C_rad6_peroxide_F_tripeptide_occupancy.txt"),
                       sample_name = c("S288C_rad6","S288C_rad6_peroxide"),
                       mark_motif = c("KIP"),
                       pcol = "#FF33CC",
                       mark_color = "black")

# 4x8
p1 | p2

relative distance to tri-peptide motif

作者发现 PA 位点主要是 异亮氨酸-脯氨酸E 位点则不保守,所以定义了 XIP 基序,X 为任意氨基酸。然后分析了 readsXIP 富集的富集情况,包括野生型,RAD6 缺失型,及回补 RAD6 和突变的 RAD6 的分析。观察图形可以发现,reads 也是没有经过 offset 校正的,分析的时候要注意一下。


对数据进行标准化:

load("monosome.rda")

monosome <- monosome %>%
  dplyr::mutate(length = as.numeric(as.character(length))) %>%
  dplyr::filter(length %in25:34)

# get normalized data
mono_norm_df <- get_nomalized_counts(longest_trans_file = "longest_info_extend.txt",
                                     qc_df = monosome,
                                     exclude_upstreamCodon = 5,
                                     exclude_downstreamCodon = 5)

计算 readsXIP 的相对距离,注意 X 是任意氨基酸,但是我们也不能写 20 种分别去寻找然后整合数据,这时候可以使用正则表达式的通配符,指定 motif = ".IP" 即可匹配到所有类型的 IP 结尾的基序:

rel_dist_motif(amino_file = "sac_amino_acid.fa",
               longest_trans_file = "longest_info_extend.txt",
               normed_file = mono_norm_df %>%
                 dplyr::filter(sample %in% c("SUB280_WT_rep1_F","SUB280_WT_rep2_F",
                                             "SUB280_WT_peroxide_rep1_F","SUB280_WT_peroxide_rep2_F",
                                             "SUB280_rad6_rep1_F","SUB280_rad6_rep2_F",
                                             "SUB280_rad6_peroxide_rep1_F","SUB280_rad6_peroxide_rep2_F",
                                             "SUB280_rad6_RAD6_F","SUB280_rad6_RAD6_peroxide_F",
                                             "SUB280_rad6_RAD6_C88A_F","SUB280_rad6_RAD6_C88A_peroxide_F")),
               min_counts = 64,
               motif = ".IP",
               upstream = -50,
               downstream = 50)

拿到数据了就可以绘图可视化了:

# plot
library(patchwork)

p1 <-
rel_dist_motif_plot(motif_occupancy_file = c("relative_distance_motif/SUB280_WT_rep1_F_.IP_motif_occupancy.txt",
                                             "relative_distance_motif/SUB280_WT_rep2_F_.IP_motif_occupancy.txt",
                                             "relative_distance_motif/SUB280_WT_peroxide_rep1_F_.IP_motif_occupancy.txt",
                                             "relative_distance_motif/SUB280_WT_peroxide_rep2_F_.IP_motif_occupancy.txt"),
                    sample_name = c("WT-rep1","WT-rep2",
                                    "WT-peroxide-rep1","WT-peroxide-rep2"),
                    group_name = c("WT","WT","WT + peroxide","WT + peroxide"),
                    motif = "XIP",
                    site = "3' end") +
  scale_color_manual(values = c("WT" = "black","WT + peroxide" = "#3399CC"))

p2 <-
  rel_dist_motif_plot(motif_occupancy_file = c("relative_distance_motif/SUB280_rad6_rep1_F_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_rad6_rep2_F_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_rad6_peroxide_rep1_F_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_rad6_peroxide_rep2_F_.IP_motif_occupancy.txt"),
                      sample_name = c("rad6-rep1","rad6-rep2",
                                      "rad6-peroxide-rep1","rad6-peroxide-rep2"),
                      group_name = c("rad6","rad6","rad6 + peroxide","rad6 + peroxide"),
                      motif = "XIP",
                      site = "3' end") +
  scale_color_manual(values = c("rad6" = "black","rad6 + peroxide" = "#FF33CC")) +
  ylim(0,10)

p1 / p2

看看回补野生型和突变型 RAD6 的情况:

p1 <-
  rel_dist_motif_plot(motif_occupancy_file = c("relative_distance_motif/SUB280_rad6_RAD6_F_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_rad6_RAD6_peroxide_F_.IP_motif_occupancy.txt"),
                      sample_name = c("rad6_RAD6","rad6_RAD6_peroxide"),
                      motif = "XIP",
                      site = "3' end") +
  scale_color_manual(values = c("rad6_RAD6" = "black","rad6_RAD6_peroxide" = "#3399CC")) +
  ylim(0,8)

p2 <-
  rel_dist_motif_plot(motif_occupancy_file = c("relative_distance_motif/SUB280_rad6_RAD6_C88A_F_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_rad6_RAD6_C88A_peroxide_F_.IP_motif_occupancy.txt"),
                      sample_name = c("rad6_RAD6_C88A","rad6_RAD6_C88A_peroxide"),
                      motif = "XIP",
                      site = "3' end") +
  scale_color_manual(values = c("rad6_RAD6_C88A" = "black","rad6_RAD6_C88A_peroxide" = "#FF33CC")) +
  ylim(0,8)

p1 / p2

motif logo

文章第一张主图对氧化应激处理上升的 motif 还进行了可视化,使用的是 pLogo(https://plogo.uconn.edu/#) 工具,作者图注里写了选择 1.5 倍的,然后输入 foregroundbackgroud 可视化。这里我们的复现结果和文章还是有区别的,这里就不分析了,注意介绍一下怎么使用。进入网页后你可以选择上传的 foregroundbackgroud 序列,中间部分有简短的操作教程,点击 generate pLogo 即可生成图形。

disome-seq analysis

前面的分析都是基于 monsome 的,作者还测了 disome 的数据,同样的方法我们也来分析看看。先看看文章为什么要做这个东西。之前的研究表明,E3 泛素连接酶 Hel2 可以通过泛素化碰撞的核糖体来触发 RQC 通路,从而促进核糖体的解离和重新利用。作者进一步在 hel2Δ 突变株中进行了 Ribo-seq 实验,发现在氧化应激条件下,XIP 序列的"氧化应激诱导的核糖体暂停"现象仍然存在。这表明,Rad6 介导的核糖体暂停并不依赖于 RQC 通路,而是通过一种独立的机制来调控转录过程。

注意左下那个图还是 monosome 的,做了一个 hel2 缺失的实验。

Δhel2 pause score

# hel2 monosome data
load("mono_norm_df.rda")

# filter samples
ft_data <- mono_norm_df %>%
  dplyr::filter(sample %in% c("SUB280_hel2_F","SUB280_hel2_peroxide_F"))

# calculate pause score for motif
peptide_motif_score2(amino_file = "sac_amino_acid.fa",
                     normed_exp_file = ft_data,
                     longest_trans_file = "longest_info_extend.txt",
                     norm_type = "normed_count",
                     window = 50,
                     occurrence_threshold = 100)
# SUB280_hel2_F has been processed!
# SUB280_hel2_peroxide_F has been processed!

# plot
triAmino_scater_plot(occupancy_file = c("normalized_count_data/SUB280_hel2_F_tripeptide_occupancy.txt",
                                        "normalized_count_data/SUB280_hel2_peroxide_F_tripeptide_occupancy.txt"),
                     sample_name = c("hel2","hel2_peroxide"),
                     mark_motif = c("KIP","MIP","QIP","KIH","EIH","RKK","PPD","PPE"),
                     pcol = "#006600",
                     mark_color = "black")

disome-seq pause score

先计算 pause score

load("diso_norm_df.rda")

unique(diso_norm_df$sample)

# calculate pause score for motif
peptide_motif_score2(amino_file = "sac_amino_acid.fa",
                     normed_exp_file = diso_norm_df,
                     longest_trans_file = "longest_info_extend.txt",
                     norm_type = "normed_count",
                     window = 50,
                     occurrence_threshold = 100)

然后绘图,可以看到有些基序没有找到:

# plot
library(patchwork)

p1 <-
  triAmino_scater_plot(occupancy_file = c("normalized_count_data/SUB280_WT_rep1_Fd_tripeptide_occupancy.txt",
                                          "normalized_count_data/SUB280_WT_rep2_Fd_tripeptide_occupancy.txt",
                                          "normalized_count_data/SUB280_WT_peroxide_rep1_Fd_tripeptide_occupancy.txt",
                                          "normalized_count_data/SUB280_WT_peroxide_rep2_Fd_tripeptide_occupancy.txt"),
                       sample_name = c("WT-rep1","WT-rep2","WT-peroxide-rep1","WT-peroxide-rep2"),
                       group_name = c("WT","WT","WT + peroxide","WT + peroxide"),
                       mark_motif = c("KIP","MIP","QIP","KIH","EIH","RKK","PPD","PPE"),
                       pcol = "#0099CC",
                       mark_color = "black")


p2 <-
  triAmino_scater_plot(occupancy_file = c("normalized_count_data/SUB280_rad6_Fd_tripeptide_occupancy.txt",
                                          "normalized_count_data/SUB280_rad6_peroxide_Fd_tripeptide_occupancy.txt"),
                       sample_name = c("rad6","rad6_peroxide"),
                       mark_motif = c("KIP","MIP","QIP","KIH","EIH","RKK","PPD","PPE"),
                       pcol = "#9933CC",
                       mark_color = "black")

# 4x8
p1 | p2

relative to XIP motif

计算:

# relative distance
load("disome.rda")

disome <- disome %>%
  dplyr::mutate(length = as.numeric(as.character(length))) %>%
  dplyr::filter(length %in57:63)

# get normalized data
disome_norm_df <- get_nomalized_counts(longest_trans_file = "longest_info_extend.txt",
                                       qc_df = disome,
                                       exclude_upstreamCodon = 5,
                                       exclude_downstreamCodon = 5)

# calculate relative distance density
rel_dist_motif(amino_file = "sac_amino_acid.fa",
               longest_trans_file = "longest_info_extend.txt",
               normed_file = disome_norm_df,
               min_counts = 64,
               motif = ".IP",
               upstream = -50,
               downstream = 50)

绘图:

# plot
library(patchwork)

p1 <-
  rel_dist_motif_plot(motif_occupancy_file = c("relative_distance_motif/SUB280_WT_rep1_Fd_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_WT_rep2_Fd_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_WT_peroxide_rep1_Fd_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_WT_peroxide_rep2_Fd_.IP_motif_occupancy.txt"),
                      sample_name = c("WT-rep1","WT-rep2",
                                      "WT-peroxide-rep1","WT-peroxide-rep2"),
                      group_name = c("WT","WT","WT + peroxide","WT + peroxide"),
                      motif = "XIP",
                      site = "3' end") +
  scale_color_manual(values = c("WT" = "black","WT + peroxide" = "#3399CC")) +
  labs(title = "Disome seq")

p2 <-
  rel_dist_motif_plot(motif_occupancy_file = c("relative_distance_motif/SUB280_rad6_Fd_.IP_motif_occupancy.txt",
                                               "relative_distance_motif/SUB280_rad6_peroxide_Fd_.IP_motif_occupancy.txt"),
                      sample_name = c("rad6","rad6-peroxide"),
                      motif = "XIP",
                      site = "3' end") +
  scale_color_manual(values = c("rad6" = "black","rad6-peroxide" = "#FF33CC")) +
  labs(title = "Disome seq") +
  ylim(0,13)

# 5X7
p1 / p2

最后就差不多结束了,有些地方其实并不是那么满意。

结尾

路漫漫其修远兮,吾将上下而求索。


欢迎加入生信交流群。加我微信我也拉你进 微信群聊 老俊俊生信交流群 (微信交流群需收取 20 元入群费用,一旦交费,拒不退还!(防止骗子和便于管理))


往期回顾目录

Ribo-SEQ 数据质量评估
寻找核糖体暂停位点
Molecular Cell 文章结果复现(终)
计算 Ribosome 距离二肽/三肽基序距离的密度
怎么给书籍插入参考文献
Ribo-seq 核糖体去除新方法?
Molecular Cell 文章部分结果复现(续)
Molecular Cell 文章部分结果复现
提取最长转录本信息+exon 序列+CDS 序列+氨基酸序列
检测 Ribo-SEQ 中的 offset