sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

Unable to run zUMIs - SmartSeq3xpress #338

Closed KoenDeserranno closed 1 year ago

KoenDeserranno commented 1 year ago

Describe the bug Dear

We wanted to use zUMIs to analyze our first Smart-seq3xpress dataset of 20 cells. Illumina’s Basespace indicated that 15 out of the 20 well barcodes have been found. We wanted to further analyze our data using zUMIs, but ran into an error regarding the different versions of STAR for the mapper and the index. This resulted in Vim: Warning: Output is not to a terminal. The script halts there, and nothing further happens. Our STAR index was created using STAR2.7.3a, despite the terminal output stating STAR 2.7.1a.

To Reproduce We cloned zUMIs from github into the project folder (linux server). The raw .bcl files were basecalled using bcl2fastq, without providing sample sheet information. The .yaml file was adapted and is provided in annex.

Terminal output _$ ./zUMIs/zUMIs.sh -c -y /data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs/20221121_Smartseq3xpress.yaml Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file! You provided these parameters: YAML file: /data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs/20221121_Smartseq3xpress.yaml zUMIs directory: /data/projects/20221118_SmartSeq3XpressJurkat/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 50 zUMIs version 2.9.7c Tue Nov 22 17:23:17 CET 2022 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Tue Nov 22 17:23:47 CET 2022 [1] "36135 reads were assigned to barcodes that do not correspond to intact cells." [1] "Found 78 daughter barcodes that can be binned into 10 parent barcodes." [1] "Binned barcodes correspond to 21493 reads." Mapping... [1] "2022-11-22 17:23:51 CET" Vim: Warning: Output is not to a terminal

Further information I already tried using the demultiplex .fastq files which were generated by the sequencer and recombined them as described in the zUMIs wiki. After adapting the .yaml file accordingly, the issue persisted. Terminal output:

Terminal output _$ ./zUMIs/zUMIs.sh -c -y /data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs/20221121_Smartseq3xpress.yaml Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file! Traceback (most recent call last): File "/data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs/zUMIs-env/bin/conda-unpack", line 1170, in placeholder, mode=mode) File "/data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs/zUMIs-env/bin/conda-unpack", line 66, in update_prefix with open(path, 'rb+') as fh: OSError: [Errno 26] Text file busy: '/data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs/zUMIs-env/bin/pigz' You provided these parameters: YAML file: /data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs/20221121_Smartseq3xpress.yaml zUMIs directory: /data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 50 zUMIs version 2.9.7c Tue Nov 22 13:52:13 CET 2022 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Tue Nov 22 13:52:42 CET 2022 [1] " reads were assigned to barcodes that do not correspond to intact cells." Error in setnames(x, value) : Can't assign 0 names to a 1 column data.table Calls: BCbin ... colnames<- -> names<- -> names<-.data.table -> setnames Execution halted Mapping... [1] "2022-11-22 13:52:44 CET" Vim: Warning: Output is not to a terminal Nov 22 14:00:25 ..... started mapping Nov 22 14:00:27 ..... finished mapping Nov 22 14:00:32 ..... finished successfully /data/projects/20221118_SmartSeq3XpressJurkat/zUMIs/splitfq.sh: line 22: /usr/bin/ls: Argument list too long

Further information

The script halts there. Any advice is very much appreciated! Kind regards Koen

Additional context 20221121_Smartseq3xpress.txt

Add any other context about the problem here.

cziegenhain commented 1 year ago

Hi Koen,

The warning about the STAR version being slightly different is normal, STAR doesn't write always the fully up to date version into the genome index file. Thats why we only print a warning in zUMIs instead of throwing an error.

The bcl2fastq without sample sheet is definitely the correct way to go here and the YAML file looks good to me. I have never seen the Vim: Warning: Output is not to a terminal message, is that just from opening the log here or was that printed on your shell? Do you run this in some sort of load manager like slurm? Are you sure the zUMIs run was broken at the start of the mapping stage (eg. no more processes visible in htop?)

All the best Christoph

KoenDeserranno commented 1 year ago

Hi Christoph,

Thanks for the quick reply! The Vim: Warning: Output is not to a terminal was printed on the shell. We don't use a load manager. After the Vim warning message was printed on the shell, only 'Vim' processes were still visible in htop.

In the mean time, I managed to overcome the issue by making a little detour. I added "zUMIs_directory: /data/projects/20221118_SmartSeq3Xpress_Jurkat/zUMIs" in the last line of code in the .yaml file and ran zUMIs without -c option in a conda virtual environment I created myself (STAR 2.7.3a, R 4.2.0., samtools 1.7, pigz 2.3.4). In this virtual environment, filtering and mapping stage were succesful without the vim error. However, some dependencies needed for the counting stage were missing in my own environment, so I adapted the .yaml file to only start from the counting stage. I subsequently ran it in zUMIs (with -c option), so it could use the dependencies in the zUMIs environment. Counting and summarising steps (based on the mapping data I generated in my own virtual environment) of the zUMIs pipeline were succesful.

cziegenhain commented 1 year ago

Very odd, Vim is not used anywhere in zUMIs, I recommend to clone a fresh copy from GitHub!

cziegenhain commented 1 year ago

Hi,

Just heard from another user who encountered the Vim error. Seems to be related to a missing end of line character in the end of the YAML! So thats an easy fix.

All the best, Christoph image

KoenDeserranno commented 1 year ago

Thanks for the information, Christoph!