liulab-dfci / MAESTRO

Single-cell Transcriptome and Regulome Analysis Pipeline
GNU General Public License v3.0
277 stars 76 forks source link

scrna-init creates a problematic snakefile for 10x-genomics #91

Closed amathelier closed 3 years ago

amathelier commented 3 years ago

I used the following command to create a Snakefile following the instructions at https://github.com/liulab-dfci/MAESTRO/blob/master/example/RNA_infrastructure_10x/RNA_infrastructure_10x.md.

Unfortunately, the Snakefile does not work since I get the following error:

IndexError in line 34 of /storage/mathelierarea/processed/anthoma/Projects/single_cell/results/20201216_maestro/10X_PBMC_RNA_8k_MAESTRO_V122/Snakefile:
list index out of range
  File "/storage/mathelierarea/processed/anthoma/Projects/single_cell/results/20201216_maestro/10X_PBMC_RNA_8k_MAESTRO_V122/Snakefile", line 34, in <module>
  File "/lsc/MAESTRO/1.2.2/envs/MAESTRO/lib/python3.8/site-packages/MAESTRO/scRNA_utility.py", line 44, in getfastq_10x

The problem comes from the fact that the scrna-init command did not provide values for the transcript, barcode, and decompress variable; and these variables are used in the Snakefile for the STAR command.

This should not be the case given your documentation that specifies that --fastq-barcode and --fastq-transcript should only be provided for the Dropseq format (not 10x-genomics)

You can find the Snakefile and config.xml files in the archive enclosed. snakefile_and_config.zip

baigal628 commented 3 years ago

Hi Anthony,

I haven't observed any error at this point based on your config and snakefile file. Could you please do $ ls /storage/mathelierarea/processed/anthoma/Projects/single_cell/data/10x/scRNA_pbmc8k to check if the fastq directory and prefix are correctly specified.

Also, I found that you are using the wrong barcode list which is for scATAC-seq data. Could you please use 737K-august-2016.txt instead?

Thanks, Gali

amathelier commented 3 years ago
$ ls /storage/mathelierarea/processed/anthoma/Projects/single_cell/data/10x/scRNA_pbmc8k
pbmc8k_fastqs.tar

The problem IMO is the fact that the Snakefile requests variables that are not set in the config.yaml file

baigal628 commented 3 years ago

It shows that you only have an archived file in the current directory. However, Snakemake needs direct access to the fastqs. Could you use $ tar xvf pbmc8k_fastqs.tar to untar the files and point the --fastq-dir to the directory where the fastq (.fastq/ .fastq.gz) files locate?

Thanks.

amathelier commented 3 years ago

That indeed fixes the issue. I am sorry for the oversight on my side!