Closed WT215 closed 5 years ago
Before those steps, should I merge bam files at first so that each bam file represents one cell?
Yes, you are correct. This is a critical step. You need to combine them into a single file before deduplicating the UMIs. If you did each file separately and then combined them, the UMI counts would be inflated because any given UMI might be present in more than one of the BAM files.
Does three mean the library was sequenced in three lanes?
Yes. Each master mix was sequenced in 3 lanes, and thus there are 3 FASTQ (or BAM) files per single cell sample.
When I look into the file singleCellSeq/code/Snakefile, it seems that I need to merge, sort and index those bam files.
You don't want to pay much attention to that Snakefile. I used it to analyze a small pilot data set. Because the HPC cluster we used to perform this analysis couldn't handle a huge Snakemake-based pipeline, I had to switch to using array jobs. The full description of the steps is in Process sequence data. If you've mapped the samples, the next step is process the BAM files and then combine the single cell samples.
The scripts were written to use Sun Grid Engine array jobs. You can either adapt these for use with your cluster's scheduling software, or you can extract the relevant code into your own Snakemake pipeline.
My more recent projects use Snakemake to process the entire data set (we switched to using a better HPC cluster), so these may be helpful if you want to implement a Snakemake pipeline. For example, for the project singlecell-qtl, the Snakefile first combines the FASTQ files prior to mapping.
@jdblischak Thank you so much! I am crystal clear now!
Best wishes, Wenhao
Hi,
Following the "Read mapping" section in your paper, now I have successfully obtained 2592
bam
files for the single cells.I am going to do the next step :
When I look into the file
singleCellSeq/code/Snakefile
, it seems that I need tomerge
,sort
andindex
those bam files.Before those steps, should I merge
bam
files at first so that eachbam
file represents one cell? (Currently I have 2592 bam files. Since there are 864 cells, I guessed that every three of bam files correspond to one cell? Doesthree
mean the library was sequenced in three lanes?)Thank you very much!
Best wishes, Wenhao