sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

Invalid size argument during downsampling #326

Closed ScottNortonPhD closed 1 year ago

ScottNortonPhD commented 1 year ago

Getting this error when running zUMIs 2.9.7c on a 1,000,000 read test dataset.

[1] "2022-08-30 11:08:40 EDT"
[1] "Coordinate sorting intermediate bam file..."
[bam_sort_core] merging from 0 files and 4 in-memory blocks...
[1] "2022-08-30 11:08:42 EDT"
[1] "Hamming distance collapse in barcode chunk 1 out of 1"
[1] "Splitting data for multicore hamming distance collapse..."
[1] "Setting up multicore cluster & generating molecule mapping tables ..."
[1] "Finished multi-threaded hamming distances"
[1] "Correcting UMI barcode tags..."
Loading molecule correction dictionary...
Correcting UB tags...
[1] "7.2e+07 Reads per chunk"
[1] "2022-08-30 11:11:16 EDT"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 47 barcodes in this chunk..."
Error: invalid 'size' argument
Execution halted
Tue Aug 30 11:11:31 EDT 2022
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '$outdir/zUMIs_output/expression/$name.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Tue Aug 30 11:11:36 EDT 2022
Tue Aug 30 11:11:36 EDT 2022

zUMIs.yaml:

project: name
sequence_files:
  file1:
    name: path/to/R1.fastq.gz
    base_definition:
      - BC(1-16)
      - UMI(17-28)
  file2:
    name: path/to/R2.fastq.gz
    base_definition:
      - cDNA(1-90)
reference:
  STAR_index: path/to/hg38/STAR_2.7.9a_89/
  GTF_file: path/to/hg38/gencode.v41.annotation.gtf
out_dir: path/to/outdir
num_threads: 4
mem_limit: 16
filter_cutoffs:
  BC_filter:
    num_bases: 1
    phred: 20
  UMI_filter:
    num_bases: 1
    phred: 20
barcodes:
  barcode_file: path/to/barcodes.txt
  barcode_num: 96
  automatic: false
  BarcodeBinning: 2
  demultiplex: true
  nReadsperCell: true
counting_opts:
  Ham_Dist: 2
  introns: false
  downsampling: 0
  strand: 1
  write_ham: true
  velocyto: false
  primaryHit: true
  twoPass: true
make_stats: true
which_Stage: Counting
samtools_exec: samtools
Rscript_exec: Rscript
STAR_exec: STAR
pigz_exec: pigz

Using singularity container

$ R --version
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.
$ samtools --version
samtools 1.15.1
Using htslib 1.15.1
Copyright (C) 2022 Genome Research Ltd.

Samtools compilation details:
    Features:       build=configure curses=yes
    CC:             gcc -march=x86-64 -mtune=generic
    CPPFLAGS:
    CFLAGS:         -g -O2
    LDFLAGS:
    HTSDIR:         htslib-1.15.1
    LIBS:
    CURSES_LIB:     -lncursesw

HTSlib compilation details:
    Features:       build=configure plugins=no libcurl=yes S3=yes GCS=yes libdeflate=yes lzma=yes bzip2=yes htscodecs=1.2.2
    CC:             gcc -march=x86-64 -mtune=generic
    CPPFLAGS:
    CFLAGS:         -g -O2 -fvisibility=hidden
    LDFLAGS:        -fvisibility=hidden

HTSlib URL scheme handlers present:
    built-in:    preload, data, file
    S3 Multipart Upload:         s3w, s3w+https, s3w+http
    Amazon S3:   s3+https, s3+http, s3
    Google Cloud Storage:        gs+http, gs+https, gs
    libcurl:     imaps, pop3, gophers, http, smb, gopher, sftp, ftps, imap, smtp, smtps, rtsp, scp, ftp, telnet, mqtt, rtmp, ldap, https, ldaps, smbs, tftp, pop3s, dict
    crypt4gh-needed:     crypt4gh
    mem:         mem
$ STAR --version
2.7.10a_alpha_220601
$ pigz --version
pigz 2.6
$ R --vanilla

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS/LAPACK: /usr/local/lib/libopenblas_nehalemp-r0.3.19.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.2.1

> installed.packages()
<<< Attachment 1 >>>

Attachments

  1. installed.packages.txt
cziegenhain commented 1 year ago

Hi,

I could imagine that there is some dependency that changed in one of the newer R 4.x releases. Could you try your run with the inbuilt conda environment from zUMIs? (zUMIs.sh -c -y your.yaml)?

Just some other comments on your yaml file: you have both the cell barcode list and number of top barcodes give, this is meant to be mutually exclusive, so I am not sure what happens in that case.

It would also help to see a full verbose log (ideally starting from Filtering stage) so I can screen for any upstream issues.

Best, Christoph