Closed ixxmu closed 2 years ago
主要五个有步骤
而质量控制(QC)是其中关键的步骤之一。常用软件FastQC可以为每一个样品生成一个html报告和一个‘zip’ 文件,zip解压之后生成fastqc_data.txt和summary.txt的文件,里面包含了测序样品的质量信息
MultiQC软件在其基础上,把所有样本的质量信息综合起来,形成html报告,方便大家快速查看和评估
但以上两个软件均需要在linux环境下执行。当后续工作需要自定义报告或者美化结果图时,在R语言下会更加容易实现,如R包fastqcr,但目前只仅限于 FastQC 文件,并且对于多个样本结果的整合可视化还是比较困难的。
文献:
ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files | Bioinformatics | Oxford Academic (oup.com)
说明书:
An Introduction To ngsReports (bioconductor.org)
ngsReports: Load FastqQC reports and other NGS related files (bioconductor.org)
报告解读:
#BiocManager::install("ngsReports")
library(ngsReports)
这里直接载入R包自带的6个fastqc数据作例子
转换成 FastqcDataList
格式
# Get the files included with the package
fileDir <- system.file("extdata", package = "ngsReports")
files <- list.files(fileDir, pattern = "fastqc.zip$", full.names = TRUE)
files
# Load the FASTQC data as a FastqcDataList object
fdl <- FastqcDataList(files)
# FastqcDataList for 6 files.
> files
[1] "D:/R-4.1.1/library/ngsReports/extdata/ATTG_R1_fastqc.zip"
[2] "D:/R-4.1.1/library/ngsReports/extdata/ATTG_R2_fastqc.zip"
[3] "D:/R-4.1.1/library/ngsReports/extdata/CCGC_R1_fastqc.zip"
[4] "D:/R-4.1.1/library/ngsReports/extdata/CCGC_R2_fastqc.zip"
[5] "D:/R-4.1.1/library/ngsReports/extdata/GACC_R1_fastqc.zip"
[6] "D:/R-4.1.1/library/ngsReports/extdata/GACC_R2_fastqc.zip"
getModule(fdl[[1]], "Summary")
# # A tibble: 12 x 3
# Filename Status Category
# <chr> <chr> <chr>
# 1 ATTG_R1.fastq PASS Basic Statistics 基本信息
# 2 ATTG_R1.fastq FAIL Per base sequence quality 各个位置测序质量
# 3 ATTG_R1.fastq WARN Per tile sequence quality 每个tile的测序质量
# 4 ATTG_R1.fastq PASS Per sequence quality scores 序列质量分数
# 5 ATTG_R1.fastq FAIL Per base sequence content 序列碱基分布
# 6 ATTG_R1.fastq FAIL Per sequence GC content 序列平均GC分布
# 7 ATTG_R1.fastq PASS Per base N content N含量分布
# 8 ATTG_R1.fastq PASS Sequence Length Distribution 序列长度分布
# 9 ATTG_R1.fastq FAIL Sequence Duplication Levels 重复序列的频率
# 10 ATTG_R1.fastq FAIL Overrepresented sequences 重复序列
# 11 ATTG_R1.fastq FAIL Adapter Content 接头序列
# 12 ATTG_R1.fastq FAIL Kmer Content 计算每个重复段序列出现的次数
# Copy these files to tempdir() to avoid overwriting
# any files in the package directory
tempdir()
file.copy(files, tempdir(), overwrite = TRUE)
writeHtmlReport(fastqcDir = tempdir())
上面报告中每一条结果对应的函数
plotReadTotals(fdl)
plotSummary(fdl)
plotBaseQuals(fdl)
plotSeqQuals(fdl)
plotSeqContent(fdl)
plotGcContent(fdl)
plotSeqLengthDistn(fdl)
plotDupLevels(fdl)
plotAdapterContent(fdl)
plotOverrep(fdl)
#remotes::install_github("UofABioinformaticsHub/shinyNgsreports")
library(ngsReports)
library(shinyNgsreports)
fastqcShiny(fdl)
以 plotSummary
作例子,可以看帮助文档得到它的参数基本格式
格式如下:
# ?plotSummary
## S4 method for signature 'FastqcDataList'
plotSummary(
x,
usePlotly = FALSE,
labels,
pwfCols, # 颜色
cluster = FALSE,
dendrogram = FALSE,
...,
gridlineWidth = 0.2,
gridlineCol = "grey20"
)
其实,大概猜到,完全可以加上ggplot2进行修饰
This uses the standard ggplot2 syntax to create a three colour plot. The output of this function canbe further modified using the standard ggplot2 methods if required.
可以通过调整颜色进一步美化
这个时候就要看大家ggplot2的功底了!
# 1.展示颜色
getColours(pwf)
# PASS WARN FAIL MAX
# "#00CC00" "#E6E633" "#CC3333" "#FFFFFF"
# 2. 透明度
pwf2 <- setAlpha(pwf, 0.7)
# 3.重新定义颜色
# 需要rgb格式
# setColours(object, PASS, WARN, FAIL, MAX)
pwf2=setColours(pwf,rgb(0.6,0.86,0.63),
rgb(0.99,0.68,0.38),
rgb(0.83,0.24,0.31),
rgb(1,1,1))
# 4.重新画图
plotSummary(fdl,
pwfCols=pwf2,# 颜色
gridlineWidth=0.8, # 线条宽度
gridlineCol = "black")+ # 线条颜色
theme(axis.text= element_text(face = 'bold'))+ # 坐标轴字体
labs(x="",y="") # 坐标轴标签
授人以鱼不如授人以渔
大家完全可以美化其他的结果图
除了处理fastqc文件,还可以导入多个文件输出的log日志,不过每次只支持一种工具
如下:
Adapter removal and trimming
Mapping and alignment
Transcript/gene quantification
Genome assembly
例如看基因比对结果
fls <- c("bowtiePE.txt", "bowtieSE.txt")
bowtieLogs <- system.file("extdata", fls, package = "ngsReports")
df <- importNgsLogs(bowtieLogs, type = "bowtie")
df %>%
dplyr::select("Filename", starts_with("Reads")) %>%
pander(
split.tables = Inf,
style = "rmarkdown",
big.mark = ",",
caption = "Select columns as an example of output from bowtie."
)
https://mp.weixin.qq.com/s/Wdlqzfqiou0fTIWhT9LrbQ