kharchenkolab / dropEst

Pipeline for initial analysis of droplet-based single-cell RNA-seq data
GNU General Public License v3.0
87 stars 43 forks source link

Produce multiple files (each corresponding to a single cell) using dropTag #17

Open leonfodoulian opened 6 years ago

leonfodoulian commented 6 years ago

Hi,

Thank you for developing this excellent tool. I am trying to use dropTag to demultiplex fastq files generated by inDrop v1. The command line I am using is the following:

../dropEst/droptag -c ../dropEst/configs/indrop_v1_2.xml -l log.file.name -p 8 -S file_1.fastq.gz file_2.fastq.gz

Note that file_1.fastq.gz corresponds to the barcodes file and file_2.fatsq.gz corresponds to the gene reads file.

However, I am only getting a single tagged file called file_2.fastq.gz.tagged.1.fastq.gz instead of getting multiple demultiplexed files. Can I instead get two files for each cell, one corresponding to the demultiplexed barcode file, and the other to the demultiplexed gene reads file?

Thank you in advance!

Best, Leon

VPetukhov commented 6 years ago

Hi! Thank you for the comment!

However, I am only getting a single tagged file called file_2.fastq.gz.tagged.1.fastq.gz instead of getting multiple demultiplexed files.

Barcodes are encoded in read ids. The have format "@run_id!barcode#UMI . If you need to save barcodes in a separate file, please use -s option. It saves them in gzipped plain text format. Though, you should take into account that when you're using multithreading (-p option), you have no guaranties that the file with barcodes will have the same order as the file with reads. In this case you should match them by read id by hands.

Can I instead get two files for each cell, one corresponding to the demultiplexed barcode file, and the other to the demultiplexed gene reads file?

Sorry, but there is no such option for dropTag phase. It only separates gene reads from barcodes. You can write your own script to split these two files by cells, but it wouldn't fix barcode errors. Also, DropEst phase allows you to write .bam file with corrected barcodes stored in bam tags (-b option). Afterwards, you can split this bam file by barcodes (python script takes about 10 lines) and convert them to fastq.