ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
125 stars 31 forks source link

使用Docker进行BD单细胞转录组数据上游分析--初步尝试 #3176

Closed ixxmu closed 1 year ago

ixxmu commented 1 year ago

https://mp.weixin.qq.com/s/uyMcThYI7uq9TV1uBLktAw

ixxmu commented 1 year ago

使用Docker进行BD单细胞转录组数据上游分析--初步尝试 by 东林的扯淡小屋

首先安装docker:

Install Docker Engine on Ubuntu | Docker Documentation

https://docs.docker.com/engine/install/ubuntu/


#创建名为BD的软件安装环境conda create -n BD python=2#查看当前conda环境conda info --envs#激活conda的rna环境conda activate BDpip install cwlref-runnersudo apt install nodejs# check cwl-runner
wget -c https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/GRCm38-PhiX-gencodevM19/GRCm38-PhiX-gencodevM19-20181206.tar.gzwget -c https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/GRCm38-PhiX-gencodevM19/gencodevM19-20181206.gtf

下载:https://bitbucket.org/CRSwDev/cwl/src/master/

官方软件安装:

docker pull daocloud.io/bdgenomics/rhapsody:1.9.1docker images


# 备份一下配置文件cp template_wta_1.9.1.yml my_wta_1.9.1-2022-08-29_11-57.yml

然后修改:

#!/usr/bin/env cwl-runner
cwl:tool: rhapsody
# This is a template YML file used to specify the inputs for a BD Genomics WTA Rhapsody Analysis pipeline run. See the# BD Genomics Analysis Setup User Guide (Doc ID: 47383) for more details. Enter the following information:

## Reads (required) - Path to your read files in the FASTQ.GZ format. You may specify as many R1/R2 read pairs as you want.Reads:
- class: File location: "/data/yudonglin/reference/singcell/BD/hCS_Run1_S1_L001_R1_001.fastq.gz" - class: File location: "/data/yudonglin/reference/singcell/BD/hCS_Run1_S1_L001_R2_001.fastq.gz"
- class: File location: "/data/yudonglin/reference/singcell/BD/hCS_Run1_S2_L002_R1_001.fastq.gz" - class: File location: "/data/yudonglin/reference/singcell/BD/hCS_Run1_S2_L002_R2_001.fastq.gz"
## Reference_Genome (required) - Path to STAR index for tar.gz format. See Doc ID: 47383 for instructions to obtain pre-built STAR index file.Reference_Genome: class: File location: "/data/yudonglin/reference/singcell/BD/ref/GRCm38-PhiX-gencodevM19-20181206.tar.gz"
## Transcriptome_Annotation (required) - Path to GTF annotation fileTranscriptome_Annotation: class: File location: "/data/yudonglin/reference/singcell/BD/ref/gencodevM19-20181206.gtf"
## AbSeq_Reference (optional) - Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used.#AbSeq_Reference:# - class: File# location: "test/AbSeq_reference.fasta"
## Supplemental_Reference (optional) - Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences used in the experiment.#Supplemental_Reference:# - class: File# location: "test/supplemental_reference.fasta"
###################################### Putative Cell Calling Settings ######################################
## Exact cell count - Set a specific number (>=1) of cells as putative, based on those with the highest error-corrected read count#Exact_Cell_Count: 10000
## Disable Refined Putative Cell Calling - Determine putative cells using only the basic algorithm (minimum second derivative along the cumulative reads curve). The refined algorithm attempts to remove false positives and recover false negatives, but may not be ideal for certain complex mixtures of cell types. Does not apply if Exact Cell Count is set.## values can be true or false.#Basic_Algo_Only: true
########################## Subsample Settings ##########################
## Subsample (optional) - A number >1 or fraction (0 < n < 1) to indicate the number or percentage of reads to subsample.#Subsample: 0.01
## Subsample seed (optional) - A seed for replicating a previous subsampled run.#Subsample_seed: 3445
######################### Multiplex options #########################
## Sample Tags Version (optional) - Specify if multiplexed run: human, hs, mouse or mm#Sample_Tags_Version: mouse
## Subsample Sample Tags (optional) - A number >1 or fraction (0 < n < 1) to indicate the number or percentage of reads to subsample.#Subsample_Tags: 0.05
## Tag_Names (optional) - Specify the tag number followed by '-' and the desired sample name to appear in Sample_Tag_Metrics.csv# Do not use the special characters: &, (), [], {}, <>, ?, |#Tag_Names: [hCS_Run1]

开始运行:

cwl-runner --parallel --tmpdir-prefix tmp_ --outdir /data/yudonglin/reference/singcell/BD/scRNA-out /data/yudonglin/reference/singcell/BD/cwl/v1.9.1/rhapsody_wta_1.9.1.cwl /data/yudonglin/reference/singcell/BD/cwl/v1.9.1/my_wta_1.9.1-2022-08-29_11-57.yml



参考资料:

BD Single-Cell Multiomics Analysis Setup User Guide (bdbiosciences.com)

https://www.bdbiosciences.com/content/dam/bdb/marketing-documents/BD_Single_Cell_Genomics_Analysis_Setup_User_Guide_v2.pdf

CRSwDev / cwl — Bitbucket

https://bitbucket.org/CRSwDev/cwl/src/master/

https://omicx.cc/posts/2021-06-24-setup-bd-single-cell-genomics-rhapsody-analysis/

 BD Rhapsody单细胞分析系统_bd 上游分析流程_TTS56的博客-CSDN博客

https://blog.csdn.net/qq_40966210/article/details/126585259

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93811

顺利运行!

xutongran commented 3 months ago

The above advice is excellent! Any more information on how singularity uses BD pipline?