velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
153 stars 81 forks source link

How to use 10X data for generating spliced and unspliced reads? #273

Open gravitogen opened 3 years ago

gravitogen commented 3 years ago

I am trying to follow the methods section of the paper "RNA velocity of single cells". In the method section, authors have written

For the 10x genomics platform datasets, the BAM file was processed using the default parameters of the Cellranger software (10x Genomics).

Following this, I looked at the dataset GSE104323 and all the SRA files corresponding to P0 mouse. Then converted to FASTQ file using fastq-dump, generated genome index using the command

STAR --runThreadN 6 --runMode genomeGenerate --genomeDir ~/mm10index --genomeFastaFiles ~/refdata-gex-mm10-2020-A/fasta/genome.fa --sjdbGTFfile ~/refdata-gex-mm10-2020-A/genes/genes.gtf --sjdbOverhang 99

and then converted to bam file using the command

STAR --runThreadN 7 --genomeDir mm10index/ --sjdbGTFfile ~/refdata-gex-mm10-2020-A/genes/genes.gtf  --sjdbOverhang 99 --readFilesIn ~/fastq/SRR6084365.fastq --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within --outSAMattributes Standard -outFileNamePrefix GSM2795247

After that, I am not really sure what to do next. I am unable to follow

For the 10x genomics platform datasets, the BAM file was processed using the default parameters of the Cellranger software (10x Genomics).

statement.

I tried a couple of things more:

I used fastq-dump to generate zipped fastq files as follows:

fastq-dump --split-files --origfmt --gzip SRR6084365

then renamed SRR6084365_1 to SRR6084365_1_S1_L001_R1_001.fastq.gz as described in another online article https://bioinformaticsworkbook.org/dataAnalysis/RNA-Seq/Single_Cell_RNAseq/Chromium_Cell_Ranger.html#gsc.tab=0 and downloaded mm10_rmsk.gtf as described in https://labs.wsu.edu/winuthayanon/basic-analysis-of-single-cell-rna-seq-data/how-to-analyze-single%E2%80%90cell-rna%E2%80%90seq/explaining-velocyto-command-line/

then I ran following:

cellranger count --id=SRR6084_series --fastqs=fastqgz/ --transcriptome=/home/ivory/CyverseData/scRNAseqData/refdata-gex-mm10-2020-A

then it generated a folder with name SRR6084_series

after that, I ran following velocyto command:

velocyto run10x -m mm10_rmsk.gtf SRR6084_series/ ~/refdata-gex-mm10-2020-A/genes/genes.gtf 

for which I am getting following error:

2020-09-07 17:02:05,773 - ERROR - The outputs are not ready
2020-09-07 17:02:05,773 - ERROR - Can not locate the barcodes.tsv file

Clearly, I don't understand the step authors are referencing. So any help in this regard will be really helpful.

Thanks.

j-bac commented 3 years ago

gravitogen, did you manage to make any progress on this issue ? I am interested in running RNA velocity on this same dataset