sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
268 stars 67 forks source link

[E::hts_open_format] Failed to open file "NA" : No such file or directory #356

Closed officialprofile closed 1 year ago

officialprofile commented 1 year ago

Good morning,

I have encountered a problem that I don't know how to solve, i.e.

[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory

Please have a look at the fragment of my logs:

Filtering...
Wed Apr 19 14:46:52 UTC 2023
[1] "945 barcodes detected."
[1] "440557 reads were assigned to barcodes that do not correspond to intact cells."
Warning message:
Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead. 
[1] "Found 4727 daughter barcodes that can be binned into 796 parent barcodes."
[1] "Binned barcodes correspond to 152249 reads."
Mapping...
[1] "2023-04-19 14:47:04 UTC"
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
Apr 19 14:47:14 ..... started STAR run
Apr 19 14:47:14 ..... loading genome
Apr 19 14:47:14 ..... started STAR run
Apr 19 14:47:14 ..... loading genome
Apr 19 14:47:14 ..... started STAR run
Apr 19 14:47:14 ..... loading genome

...

Apr 19 14:51:15 ..... started 1st pass mapping
Apr 19 14:51:15 ..... started 1st pass mapping
Apr 19 14:51:15 ..... finished 1st pass mapping
Apr 19 14:51:16 ..... inserting junctions into the genome indices
Apr 19 14:51:44 ..... finished 1st pass mapping
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
Apr 19 14:51:46 ..... inserting junctions into the genome indices
Apr 19 14:51:47 ..... started mapping

then also another error appears:

[1] "Processing 945 barcodes in this chunk..."
Warning message:
In parallel::mclapply(mapList, function(tt) { :
  all scheduled cores encountered errors in user code
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found
Calls: convert2countM ... .makewide -> unlist -> strsplit -> .handleSimpleError -> h
Execution halted

My yaml looks like this:

project: cet
sequence_files:
  file1:
    name: /home/jupyter/cet/cet_S0_L001_R1_001.fastq.gz
    base_definition: 
    - cDNA(1-30)
  file2:
    name: /home/jupyter/cet/cet_S0_L001_I1_001.fastq.gz
    base_definition:
    - BC(16-23,54-61,92-99)
  file3:
    name: /home/jupyter/cet/cet_S0_L001_R2_001.fastq.gz
    base_definition:
    - UMI(1-10)
reference:
  STAR_index: /home/jupyter/GRCh38_and_mm10/star
  GTF_file: /home/jupyter/GRCh38_and_mm10/genes/genes.gtf
  additional_STAR_params: '--limitOutSJcollapsed 5000000'
  additional_files: 
out_dir: /home/jupyter/zumisresults
num_threads: 16
mem_limit: 80
filter_cutoffs:
  BC_filter:
    num_bases: 3
    phred: 10
  UMI_filter:
    num_bases: 2
    phred: 10
barcodes:
  barcode_num: null
  barcode_file: null
  automatic: yes
  BarcodeBinning: 0
  nReadsperCell: 10
counting_opts:
  introns: yes
  downsampling: '0'
  strand: 1
  Ham_Dist: 0
  velocyto: no
  primaryHit: no
  twoPass: yes
make_stats: yes
which_Stage: Filtering
Rscript_exec: Rscript
STAR_exec: /home/jupyter/STAR-2.7.1a/source/STAR
pigz_exec: pigz
samtools_exec: /home/jupyter/samtools-1.17/samtools
zUMIs_directory: /home/jupyter/zUMIs
read_layout: SE

Do you know what could be causing the error?

cziegenhain commented 1 year ago

Hi,

I am not 100% sure on this. The message Failed to open file "NA" can be related to the estimation of reads to put in a processing chunk within zUMIs and does not necessarily mean anything will break. But of course the downstream error during generation of the count table you have encountered is fatal. Could you send me the full log instead of just selected parts? I would be curious to see the featureCounts reports in particular.

Best, Christoph

officialprofile commented 1 year ago

Thank you @cziegenhain for your reply. Here are the full logs:

/home/jupyter/zUMIs/zUMIs.sh -y /home/jupyter/zumis.yaml

 You provided these parameters:
 YAML file: /home/jupyter/zumis.yaml
 zUMIs directory:       /home/jupyter/zUMIs
 STAR executable        /home/jupyter/STAR-2.7.1a/source/STAR
 samtools executable        /home/jupyter/samtools-1.17/samtools
 pigz executable        pigz
 Rscript executable     Rscript
 RAM limit:   80
 zUMIs version 2.9.7e 

Fri Apr 21 07:48:20 UTC 2023
Filtering...
Fri Apr 21 07:49:18 UTC 2023
[1] "945 barcodes detected."
[1] "440557 reads were assigned to barcodes that do not correspond to intact cells."
Warning message:
Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
 Please use `linewidth` instead. 
Mapping...
[1] "2023-04-21 07:49:25 UTC"
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
Apr 21 07:49:45 ..... started STAR run
Apr 21 07:49:46 ..... loading genome
Apr 21 07:49:45 ..... started STAR run
Apr 21 07:49:46 ..... loading genome
Apr 21 07:49:45 ..... started STAR run
Apr 21 07:49:46 ..... loading genome
Apr 21 07:49:45 ..... started STAR run
Apr 21 07:49:46 ..... loading genome
Apr 21 07:49:45 ..... started STAR run
Apr 21 07:49:46 ..... loading genome
Apr 21 07:49:45 ..... started STAR run
Apr 21 07:49:46 ..... loading genome
Apr 21 07:51:18 ..... processing annotations GTF
Apr 21 07:51:18 ..... processing annotations GTF
Apr 21 07:51:18 ..... processing annotations GTF
Apr 21 07:51:18 ..... processing annotations GTF
Apr 21 07:51:18 ..... processing annotations GTF
Apr 21 07:51:18 ..... processing annotations GTF
Apr 21 07:51:43 ..... inserting junctions into the genome indices
Apr 21 07:51:43 ..... inserting junctions into the genome indices
Apr 21 07:51:43 ..... inserting junctions into the genome indices
Apr 21 07:51:43 ..... inserting junctions into the genome indices
Apr 21 07:51:43 ..... inserting junctions into the genome indices
Apr 21 07:51:43 ..... inserting junctions into the genome indices
Apr 21 07:52:11 ..... started 1st pass mapping
Apr 21 07:52:12 ..... started 1st pass mapping
Apr 21 07:52:12 ..... started 1st pass mapping
Apr 21 07:52:12 ..... started 1st pass mapping
Apr 21 07:52:12 ..... started 1st pass mapping
Apr 21 07:52:12 ..... started 1st pass mapping
Apr 21 07:52:12 ..... finished 1st pass mapping
Apr 21 07:52:13 ..... inserting junctions into the genome indices
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory
Apr 21 07:52:41 ..... started mapping
Apr 21 07:52:42 ..... finished mapping
Apr 21 07:52:42 ..... finished successfully
Apr 21 07:53:26 ..... finished 1st pass mapping
Apr 21 07:53:27 ..... inserting junctions into the genome indices
Apr 21 07:53:58 ..... started mapping
Apr 21 07:54:21 ..... finished 1st pass mapping
Apr 21 07:54:22 ..... inserting junctions into the genome indices
Apr 21 07:54:24 ..... finished 1st pass mapping
Apr 21 07:54:26 ..... inserting junctions into the genome indices
Apr 21 07:54:28 ..... finished 1st pass mapping
Apr 21 07:54:29 ..... inserting junctions into the genome indices
Apr 21 07:54:37 ..... finished 1st pass mapping
Apr 21 07:54:39 ..... inserting junctions into the genome indices
Apr 21 07:54:55 ..... started mapping
Apr 21 07:54:58 ..... started mapping
Apr 21 07:55:01 ..... started mapping
Apr 21 07:55:10 ..... started mapping
Apr 21 07:55:13 ..... finished mapping
Apr 21 07:55:13 ..... finished successfully
Apr 21 07:57:10 ..... finished mapping
Apr 21 07:57:10 ..... finished successfully
Apr 21 07:57:17 ..... finished mapping
Apr 21 07:57:17 ..... finished successfully
Apr 21 07:57:23 ..... finished mapping
Apr 21 07:57:23 ..... finished successfully
Apr 21 07:57:40 ..... finished mapping
Apr 21 07:57:40 ..... finished successfully
Fri Apr 21 07:57:43 UTC 2023
Counting...
[1] "2023-04-21 07:57:57 UTC"
[1] "3.6e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/jupyter/zumisresults/cet.final_annot.gtf"
[1] "Annotation loaded!"
[1] "Assigning reads to features (ex)"

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.32.4

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S cet.filtered.tagged.Aligned.out.bam          ||
||                                                                            ||
||              Annotation : R data.frame                                     ||
||      Assignment details : <input_file>.featureCounts.bam                   ||
||                      (Note that files are saved to the output directory)   ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 16                                               ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid11570 ...         ||
||    Features : 559629                                                       ||
||    Meta-features : 68886                                                   ||
||    Chromosomes/contigs : 81                                                ||
||                                                                            ||
|| Process BAM file cet.filtered.tagged.Aligned.out.bam...                  ||
||    Single-end reads are included.                                          ||
||    Strand specific : stranded                                              ||
||    Assign alignments to features...                                        ||
||    Total alignments : 3369627                                              ||
||    Successfully assigned alignments : 783917 (23.3%)                       ||
||    Running time : 0.04 minutes                                             ||
||                                                                            ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.32.4

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S cet.filtered.tagged.Aligned.out.bam.ex.f ... ||
||                                                                            ||
||              Annotation : R data.frame                                     ||
||      Assignment details : <input_file>.featureCounts.bam                   ||
||                      (Note that files are saved to the output directory)   ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 16                                               ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid11570 ...         ||
||    Features : 438099                                                       ||
||    Meta-features : 52601                                                   ||
||    Chromosomes/contigs : 72                                                ||
||                                                                            ||
|| Process BAM file cet.filtered.tagged.Aligned.out.bam.ex.featureCount ... ||
||    Single-end reads are included.                                          ||
||    Strand specific : stranded                                              ||
||    Assign alignments to features...                                        ||
||    Total alignments : 3369627                                              ||
||    Successfully assigned alignments : 629432 (18.7%)                       ||
||    Running time : 0.04 minutes                                             ||
||                                                                            ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

[1] "2023-04-21 08:00:24 UTC"
[1] "Coordinate sorting final bam file..."
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[1] "2023-04-21 08:00:29 UTC"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 945 barcodes in this chunk..."
Warning message:
In parallel::mclapply(mapList, function(tt) { :
  all scheduled cores encountered errors in user code
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found
Calls: convert2countM ... .makewide -> unlist -> strsplit -> .handleSimpleError -> h
Execution halted
Fri Apr 21 08:00:36 UTC 2023
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/jupyter/zumisresults/zUMIs_output/expression/cet.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Fri Apr 21 08:00:40 UTC 2023
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2023-04-21 08:00:40 UTC"
Error in gzfile(file, "rb") : cannot open the connection
Calls: readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/jupyter/zumisresults/zUMIs_output/expression/cet.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Fri Apr 21 08:00:47 UTC 2023
cziegenhain commented 1 year ago

Hi,

I dont know exactly the source of your error so we need to check a few things to troubleshoot here.

Seems like there are only ~3 mio reads - is that expected? Did you check if the automatic barcode selection was useful? There is a plot in zUMIs_output/stats The error occurs when loading reads and collecting UMI information per BC/Gene; This could be for several reasons but before we dive into complicated checks please post the BC selection plot (in case we only select a few barcodes that do not carry counts, this error could occur) Furthermore, what is the R version used? Was there a reason not to use the built in condo environment from zUMIs?

officialprofile commented 1 year ago

~3 mio reads were expected in this case. I am using R 4.2.3 and therefore concluded that the conda environment may not be suitable (when I try to use it I get errors like the ones below).

Warning message:
package ‘yaml’ was built under R version 4.2.3 
Sat Apr 22 16:53:37 UTC 2023
Error: package or namespace load failed for ‘data.table’:
 .onLoad failed in loadNamespace() for 'data.table', details:
  call: fun(libname, pkgname)
  error: This is R 3.6.3 but data.table has been installed using R 4.2.3. The major version must match. Please reinstall data.table.
In addition: Warning message:
package ‘data.table’ was built under R version 4.2.3 
Execution halted
Mapping...

Do you think that deleting the R 4.2.3 and installing the R 3.6.3 might solve the problem?

officialprofile commented 1 year ago

I have updated everything I could in my environment, including the R packages, and the error miraculously disappeared. I have no idea why. Unfortunately, there is still the second issue that I don't know how to deal with, i.e.

In parallel::mclapply(mapList, function(tt) { :
  all scheduled cores encountered errors in user code
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found
Calls: convert2countM ... .makewide -> unlist -> strsplit -> .handleSimpleError -> h

I am also enclosing the plot that I believe you asked for. cet

 You provided these parameters:
 YAML file: zumis.yaml
 zUMIs directory:       /home/jupyter/zUMIs
 STAR executable        /home/jupyter/STAR-2.7.1a/source/STAR
 samtools executable        /home/jupyter/samtools-1.17/samtools
 pigz executable        pigz
 Rscript executable     Rscript
 RAM limit:   80
 zUMIs version 2.9.7e 

Mon Apr 24 09:31:51 UTC 2023
Filtering...
Mon Apr 24 09:32:40 UTC 2023
"945 barcodes detected."
[1] "440557 reads were assigned to barcodes that do not correspond to intact cells."
Warning message:
Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
 Please use `linewidth` instead. 
Mapping...
[1] "2023-04-24 09:32:46 UTC"
Apr 24 09:32:58 ..... started STAR run
Apr 24 09:32:58 ..... loading genome
Apr 24 09:32:58 ..... started STAR run
Apr 24 09:32:58 ..... loading genome
Apr 24 09:32:58 ..... started STAR run
Apr 24 09:32:58 ..... loading genome
Apr 24 09:32:58 ..... started STAR run
Apr 24 09:32:58 ..... loading genome
Apr 24 09:32:58 ..... started STAR run
Apr 24 09:32:58 ..... loading genome
Apr 24 09:32:58 ..... started STAR run
Apr 24 09:32:58 ..... loading genome
Apr 24 09:33:03 ..... processing annotations GTF
Apr 24 09:33:03 ..... processing annotations GTF
Apr 24 09:33:03 ..... processing annotations GTF
Apr 24 09:33:03 ..... processing annotations GTF
Apr 24 09:33:03 ..... processing annotations GTF
Apr 24 09:33:03 ..... processing annotations GTF
Apr 24 09:33:22 ..... inserting junctions into the genome indices
Apr 24 09:33:22 ..... inserting junctions into the genome indices
Apr 24 09:33:22 ..... inserting junctions into the genome indices
Apr 24 09:33:23 ..... inserting junctions into the genome indices
Apr 24 09:33:23 ..... inserting junctions into the genome indices
Apr 24 09:33:23 ..... inserting junctions into the genome indices
Apr 24 09:33:44 ..... started 1st pass mapping
Apr 24 09:33:44 ..... started 1st pass mapping
Apr 24 09:33:44 ..... started 1st pass mapping
Apr 24 09:33:45 ..... started 1st pass mapping
Apr 24 09:33:46 ..... started 1st pass mapping
Apr 24 09:33:46 ..... started 1st pass mapping
Apr 24 09:34:50 ..... finished 1st pass mapping
Apr 24 09:34:51 ..... inserting junctions into the genome indices
Apr 24 09:35:12 ..... finished 1st pass mapping
Apr 24 09:35:13 ..... inserting junctions into the genome indices
Apr 24 09:35:16 ..... started mapping
Apr 24 09:35:29 ..... finished 1st pass mapping
Apr 24 09:35:30 ..... inserting junctions into the genome indices
Apr 24 09:35:31 ..... finished 1st pass mapping
Apr 24 09:35:31 ..... inserting junctions into the genome indices
Apr 24 09:35:34 ..... finished 1st pass mapping
Apr 24 09:35:35 ..... inserting junctions into the genome indices
Apr 24 09:35:38 ..... started mapping
Apr 24 09:35:39 ..... finished 1st pass mapping
Apr 24 09:35:40 ..... inserting junctions into the genome indices
Apr 24 09:35:55 ..... started mapping
Apr 24 09:35:57 ..... started mapping
Apr 24 09:36:00 ..... started mapping
Apr 24 09:36:05 ..... started mapping
Apr 24 09:36:24 ..... finished mapping
Apr 24 09:36:24 ..... finished successfully
Apr 24 09:37:09 ..... finished mapping
Apr 24 09:37:09 ..... finished successfully
Apr 24 09:37:44 ..... finished mapping
Apr 24 09:37:44 ..... finished successfully
Apr 24 09:37:46 ..... finished mapping
Apr 24 09:37:46 ..... finished successfully
Apr 24 09:37:51 ..... finished mapping
Apr 24 09:37:51 ..... finished successfully
Apr 24 09:38:05 ..... finished mapping
Apr 24 09:38:06 ..... finished successfully
Mon Apr 24 09:38:08 UTC 2023
Counting...
[1] "2023-04-24 09:38:18 UTC"
[1] "3.6e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/jupyter/zumisresults/cet.final_annot.gtf"
[1] "Annotation loaded!"
[1] "Assigning reads to features (ex)"

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.32.4

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S cet.filtered.tagged.Aligned.out.bam          ||
||                                                                            ||
||              Annotation : R data.frame                                     ||
||      Assignment details : <input_file>.featureCounts.bam                   ||
||                      (Note that files are saved to the output directory)   ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 16                                               ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid5695 ...          ||
||    Features : 559629                                                       ||
||    Meta-features : 68886                                                   ||
||    Chromosomes/contigs : 81                                                ||
||                                                                            ||
|| Process BAM file cet.filtered.tagged.Aligned.out.bam...                  ||
||    Single-end reads are included.                                          ||
||    Strand specific : stranded                                              ||
||    Assign alignments to features...                                        ||
||    Total alignments : 4028010                                              ||
||    Successfully assigned alignments : 927026 (23.0%)                       ||
||    Running time : 0.04 minutes                                             ||
||                                                                            ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.32.4

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S cet.filtered.tagged.Aligned.out.bam.ex.f ... ||
||                                                                            ||
||              Annotation : R data.frame                                     ||
||      Assignment details : <input_file>.featureCounts.bam                   ||
||                      (Note that files are saved to the output directory)   ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 16                                               ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid5695 ...          ||
||    Features : 438099                                                       ||
||    Meta-features : 52601                                                   ||
||    Chromosomes/contigs : 72                                                ||
||                                                                            ||
|| Process BAM file cet.filtered.tagged.Aligned.out.bam.ex.featureCount ... ||
||    Single-end reads are included.                                          ||
||    Strand specific : stranded                                              ||
||    Assign alignments to features...                                        ||
||    Total alignments : 4028010                                              ||
||    Successfully assigned alignments : 743788 (18.5%)                       ||
||    Running time : 0.04 minutes                                             ||
||                                                                            ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

[1] "2023-04-24 09:40:16 UTC"
[1] "Coordinate sorting final bam file..."
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[1] "2023-04-24 09:40:20 UTC"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 945 barcodes in this chunk..."
Warning message:
In parallel::mclapply(mapList, function(tt) { :
  all scheduled cores encountered errors in user code
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found
Calls: convert2countM ... .makewide -> unlist -> strsplit -> .handleSimpleError -> h
Execution halted
Mon Apr 24 09:40:28 UTC 2023
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/jupyter/zumisresults/zUMIs_output/expression/cet.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Mon Apr 24 09:40:30 UTC 2023
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2023-04-24 09:40:30 UTC"
Error in gzfile(file, "rb") : cannot open the connection
Calls: readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/jupyter/zumisresults/zUMIs_output/expression/cet.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Mon Apr 24 09:40:37 UTC 2023
officialprofile commented 1 year ago

Regarding the

[E::hts_open_format] Failed to open file "NA" : No such file or directory
samtools view: failed to open "NA" for reading: No such file or directory

issue once more. I have noticed that num_threads and mem_limit affect this error. Some values cause the error and some don't (not necessarily lower or higher). Using sudo also seems to help a bit.

While testing this dependency, I came across another worrying problem. Namely, zUMIs sometimes gives me different numbers of detected barcodes.

[1] "1324 barcodes detected."
[1] "663290 reads were assigned to barcodes that do not correspond to intact cells."
[1] "945 barcodes detected."
[1] "440557 reads were assigned to barcodes that do not correspond to intact cells."

The only things that change are num_threads and mem_limit. I use a fairly powerful machine and always give a margin of at least 50GB RAM to avoid memory allocation problems.

cziegenhain commented 1 year ago

Hi,

Regarding the "NA" message - I dont think you need to worry about it, it is a product of wrongly estimating total number of reads in the dataset by zUMIs but should not affect anything. In bigger datasets this almost never is an issue then and dependency on threads / memory is expected since zUMIs tries to chunk/allocate resources given these parameters. To avoid it completely, you can set a low memory (eg. 25 GB).

The inference of the number of barcodes can indeed also vary slightly within runs, especially for those datasets without a super clearly defined "knee" in the distribution of reads per barcode.

In terms of your "real" error: Seems like a part of the code where we cast count matrices into wide format fails, due to not being able to find any gene information. I still suspect there could be a version dependent issue here, but I am confused as to why you report having a version conflict when using the zUMIs conda environment (zUMIs.sh -c -y config.yaml). have you tried simply running in a basic docker environment to prevent influence of your other installed packages? Best Christoph

officialprofile commented 1 year ago

Thank you @cziegenhain for your support! You convinced me to try the dockerized version and I can happily say that it works well :) Btw. message [E::hts_open_format] Failed to open file "NA" : No such file or directory still occured though.

gevro commented 1 year ago

Is there a dockerized version of zUMIs available?

cziegenhain commented 1 year ago

Hi, you can just run zUMIs in any plain Linux (eg. Ubuntu will work well) docker. Since there is a conda environment you get when cloning from GitHub, no other special prerequisites are needed.