Error in R during counting step

I'm trying to run zUMIs to demultiplex my data. Everything runs fine until it tries to split the files after running subread in the "counting" step.

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found

The "filtering" step ran without any issue, so below is the Yaml and output of when I tried to run it again from the "mapping' step.

Thanks for any advice or suggestions!

YAML:

###########################################

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

###########################################

define a project name that will be used to name output files

project: Sample_23L001690

Sequencing File Inputs:

sequence_files: file1: name: /home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690/23L001690_S1_L001_R1_001.fastq.gz base_definition:

BC(1-12)
UMI(13-28) file2: name: /home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690/23L001690_S1_L001_R2_001.fastq.gz base_definition:
cDNA(1-94)

reference genome setup

reference: STAR_index: /home/morales/julia/Project_633_Eckstein_Werth_Silke/Lobpul1/star_index_exons_only GTF_file: /home/morales/julia/Project_633_Eckstein_Werth_Silke/Lobpul1/Lobpul1_GeneCatalog_20170213.only_exons.gft exon_extension: no extension_length: 0 scaffold_length_min: 0 additional_files: ~ additional_STAR_params: ~

output directory

out_dir: /home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690_zumi

###########################################

below, you may optionally change default parameters

###########################################

number of processors to use

num_threads: 100 mem_limit: 500

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

filter_cutoffs: BC_filter: num_bases: 1 phred: 20 UMI_filter: num_bases: 1 phred: 20

Options for Barcode handling

barcodes: barcode_num: 10 barcode_file: /home/morales/julia/Project_633_Eckstein_Werth_Silke/bc_batch1_zumi_10.txt barcode_sharing: null automatic: no
BarcodeBinning: 1 nReadsperCell: 100 demultiplex: yes

Options related to counting of reads towards expression profiles

counting_opts: introns: yes intronProb: no downsampling: 0 strand: 0 Ham_Dist: 0 velocyto: no primaryHit: yes multi_overlap: no fraction_overlap: 0 twoPass: yes

write_ham: yes

produce stats files and plots?

make_stats: yes

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

which_Stage: Mapping

define dependencies program paths

samtools_exec: samtools #samtools executable Rscript_exec: Rscript #Rscript executable STAR_exec: STAR #STAR executable pigz_exec: pigz #pigz executable

below, fqfilter will add a read_layout flag defining SE or PE

zUMIs_directory: /home/morales/Apps/zUMIs read_layout: SE

Standard output:

You provided these parameters: YAML file: zUMIs.yaml zUMIs directory: /home/morales/Apps/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 500 zUMIs version 2.9.7e

Sat 13 Apr 2024 05:26:31 PM CEST WARNING: The STAR version used for mapping is 2.7.11b and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.11b. Mapping... [1] "2024-04-13 17:26:32 CEST" Warning message: NAs introduced by coercion STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/morales/julia/Project_633_Eckstein_Werth_Silke/Lobpul1/star_index_exons_only --sjdbGTFfile /home/morales/julia/Project_633_Eckstein_Werth_Silke/Lobpul1/Lobpul1_GeneCatalog_20170213.only_exons.gft --runThreadN 98 --sjdbOverhang 93 --readFilesType SAM SE --twopassMode Basic --readFilesIn /home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690_zumi/Sample_23L001690.filtered.tagged.unmapped.bam --outFileNamePrefix /home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690_zumi/Sample_23L001690.filtered.tagged. STAR version: 2.7.11b compiled: 2024-02-23T15:55:51+01:00 :/home/morales/Apps/STAR-2.7.11b/source Apr 13 17:26:32 ..... started STAR run Apr 13 17:26:32 ..... loading genome Apr 13 17:26:33 ..... processing annotations GTF Apr 13 17:26:33 ..... started 1st pass mapping Apr 13 17:51:07 ..... finished 1st pass mapping Apr 13 17:51:07 ..... inserting junctions into the genome indices Apr 13 17:51:23 ..... started mapping Apr 13 18:44:32 ..... finished mapping Apr 13 18:44:33 ..... finished successfully Sat 13 Apr 2024 06:44:35 PM CEST Counting... [1] "2024-04-13 18:44:44 CEST" [1] "2e+09 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690_zumi/Sample_23L001690.final_annot.gtf" [1] "Annotation loaded!" Warning message: In dplyr::left_join(intron.saf, unique(exon.saf[, c("GeneID", "Strand")]), : Detected an unexpected many-to-many relationship between x and y. ℹ Row 1 of x matches multiple rows in y. ℹ Row 1 of y matches multiple rows in x. ℹ If a many-to-many relationship is expected, set relationship = "many-to-many" to silence this warning. [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				S Sample_23L001690.filtered.tagged.Aligned.o ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 64
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid2382952 ...				Features : 45875				Meta-features : 1				Chromosomes/contigs : 1802
Process BAM file Sample_23L001690.filtered.tagged.Aligned.out.bam...
Single-end reads are included.
Assign alignments to features...
Total alignments : 420523607
Successfully assigned alignments : 109655322 (26.1%)
Running time : 1.14 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				S Sample_23L001690.filtered.tagged.Aligned.o ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 64
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid2382952 ...				Features : 86534				Meta-features : 1				Chromosomes/contigs : 1740
Process BAM file Sample_23L001690.filtered.tagged.Aligned.out.bam.ex.f ...
Single-end reads are included.
Assign alignments to features...
Total alignments : 420523607
Successfully assigned alignments : 119436529 (28.4%)
Running time : 1.20 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2024-04-13 18:50:16 CEST" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 0 files and 100 in-memory blocks... [1] "2024-04-13 18:55:20 CEST" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 10 barcodes in this chunk..." Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found Calls: convert2countM ... .makewide -> unlist -> strsplit -> .handleSimpleError -> h Execution halted Sat 13 Apr 2024 06:56:39 PM CEST Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690_zumi/zUMIs_output/expression/Sample_23L001690.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sat 13 Apr 2024 06:56:41 PM CEST Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2024-04-13 18:56:41 CEST" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/morales/julia/Project_633_Eckstein_Werth_Silke/Sample_23L001690_zumi/zUMIs_output/expression/Sample_23L001690.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sat 13 Apr 2024 06:56:45 PM CEST

Dependencies :

zUMIs version 2.9.7e
- Ubuntu 20.04.6 LTS
- samtools 1.10
- R version 4.3.3
- pigz 2.4
- STAR version=2.7.11b

sdparekh / zUMIs

R error during counting step before demultiplexing. #393

Error in R during counting step

YAML:

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

define a project name that will be used to name output files

Sequencing File Inputs:

reference genome setup

output directory

below, you may optionally change default parameters

number of processors to use

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

Options for Barcode handling

Options related to counting of reads towards expression profiles

write_ham: yes

produce stats files and plots?

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

define dependencies program paths

below, fqfilter will add a read_layout flag defining SE or PE

Standard output:

Dependencies :