mehdipourrostami commented 2 years ago

Hi, Thanks for developing zUMIs.

I have two fastq.gz files each has almost 20GB in size from BD-Rhapsody.

everything runs smoothly up until the end which I get this error :

You provided these parameters: YAML file: H1.yaml zUMIs directory: /home/mehdi/All_projects/zUMIs_new_version/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 150 zUMIs version 2.9.7b

Mo 9. Mai 20:21:50 CEST 2022 WARNING: The STAR version used for mapping is 2.5.4b and the STAR index was created using the version 20201. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.5.4b. Filtering... Mo 9. Mai 21:04:34 CEST 2022 [1] "24788 barcodes detected." [1] "65978880 reads were assigned to barcodes that do not correspond to intact cells." [1] "Found 34707 daughter barcodes that can be binned into 12173 parent barcodes." [1] "Binned barcodes correspond to 6100842 reads." Mapping... [1] "2022-05-09 21:31:10 CEST" May 09 21:31:11 ..... started STAR run May 09 21:31:11 ..... loading genome May 09 21:31:11 ..... started STAR run May 09 21:31:11 ..... loading genome May 09 21:31:11 ..... started STAR run May 09 21:31:11 ..... loading genome May 09 21:31:11 ..... started STAR run May 09 21:31:11 ..... loading genome May 09 21:31:11 ..... started STAR run May 09 21:31:11 ..... loading genome May 09 21:31:47 ..... processing annotations GTF May 09 21:31:47 ..... processing annotations GTF May 09 21:31:48 ..... processing annotations GTF May 09 21:31:48 ..... processing annotations GTF May 09 21:31:48 ..... processing annotations GTF May 09 21:32:01 ..... inserting junctions into the genome indices May 09 21:32:02 ..... inserting junctions into the genome indices May 09 21:32:02 ..... inserting junctions into the genome indices May 09 21:32:02 ..... inserting junctions into the genome indices May 09 21:32:02 ..... inserting junctions into the genome indices May 09 21:37:57 ..... started 1st pass mapping May 09 21:37:58 ..... started 1st pass mapping May 09 21:37:59 ..... started 1st pass mapping May 09 21:38:15 ..... started 1st pass mapping May 09 21:38:43 ..... started 1st pass mapping May 09 21:45:18 ..... finished 1st pass mapping May 09 21:45:18 ..... inserting junctions into the genome indices May 09 21:47:05 ..... started mapping May 09 21:58:52 ..... finished successfully May 09 22:04:09 ..... finished 1st pass mapping May 09 22:04:10 ..... inserting junctions into the genome indices May 09 22:05:43 ..... finished 1st pass mapping May 09 22:05:44 ..... inserting junctions into the genome indices May 09 22:05:50 ..... finished 1st pass mapping May 09 22:05:51 ..... inserting junctions into the genome indices May 09 22:06:09 ..... finished 1st pass mapping May 09 22:06:10 ..... inserting junctions into the genome indices May 09 22:06:52 ..... started mapping May 09 22:08:35 ..... started mapping May 09 22:08:40 ..... started mapping May 09 22:08:55 ..... started mapping May 09 22:57:56 ..... finished successfully May 09 22:58:05 ..... finished successfully May 09 22:58:30 ..... finished successfully May 09 22:58:43 ..... finished successfully Mo 9. Mai 23:01:06 CEST 2022 Counting... [1] "2022-05-09 23:01:17 CEST" [1] "6.75e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/mehdi/All_projects/zUMIs_new_version/zUMIs_new_version_run/H1_test1.final_annot.gtf" [1] "Annotation loaded!" [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				S H1_test1.filtered.tagged.Aligned.out.bam
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 18
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid44249 ...				Features : 353030				Meta-features : 61598				Chromosomes/contigs : 47
Process BAM file H1_test1.filtered.tagged.Aligned.out.bam...
Single-end reads are included.
Assign alignments to features...
Total alignments : 595428237
Successfully assigned alignments : 422587802 (71.0%)
Running time : 3.21 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				S H1_test1.filtered.tagged.Aligned.out.bam.e ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 18
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid44249 ...				Features : 241556				Meta-features : 28553				Chromosomes/contigs : 33
Process BAM file H1_test1.filtered.tagged.Aligned.out.bam.ex.featureCo ...
Single-end reads are included.
Assign alignments to features...
Total alignments : 595428237
Successfully assigned alignments : 60456753 (10.2%)
Running time : 3.26 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2022-05-09 23:09:27 CEST" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 18 files and 18 in-memory blocks... [1] "2022-05-09 23:30:37 CEST" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 24788 barcodes in this chunk..." Error in mclapply(1:nrow(idxstats), function(x) { : could not find function "mclapply" Calls: reads2genes_new Execution halted Mo 9. Mai 23:30:37 CEST 2022 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/mehdi/All_projects/zUMIs_new_version/zUMIs_new_version_run/zUMIs_output/expression/H1_test1.dgecounts.rds', probable reason 'No such file or directory' Execution halted Mo 9. Mai 23:30:40 CEST 2022 Mo 9. Mai 23:30:40 CEST 2022 ./zUMIs.sh: line 323: /home/mehdi/All_projects/zUMIs_new_version/zUMIs/zUMIs-env/bin/deactivate: No such file or directory

This is my Yaml file: ###########################################

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

###########################################

define a project name that will be used to name output files

project: H1_test1

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

sequence_files: file1: name: /home/mehdi/All_projects/H1/fastq_files/H1_S1_R1_001.fastq.gz base_definition:

BC(1-9,22-30,44-52)
UMI(53-60) file2: name: /home/mehdi/All_projects/H1/fastq_files/H1_S1_R2_001.fastq.gz base_definition:
cDNA(1-76)

reference genome setup

reference: STAR_index: /home/mehdi/All_projects/starIndex/ GTF_file: /home/mehdi/All_projects/human_annotation_gtf/gencode.v40.primary_assembly.annotation.gtf exon_extension: no #extend exons by a certain width? extension_length: 0 #number of bp to extend exons by scaffold_length_min: 0 #minimal scaffold/chromosome length to consider (0 = all) additional_files: #Optional parameter. It is possible to give additional reference sequences here, eg ERCC.fa additional_STAR_params: #Optional parameter. you may add custom mapping parameters to STAR here

output directory

out_dir: /home/mehdi/All_projects/zUMIs_new_version/zUMIs_new_version_run

###########################################

below, you may optionally change default parameters

###########################################

number of processors to use

num_threads: 18 mem_limit: 150 #Memory limit in Gigabytes, null meaning unlimited RAM usage.

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

filter_cutoffs: BC_filter: num_bases: 1 phred: 20 UMI_filter: num_bases: 1 phred: 20

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

barcodes: barcode_num: null barcode_file: null barcode_sharing: null #Optional for combining several barcode sequences per cell (see github wiki) automatic: yes #Give yes/no to this option. If the cell barcodes should be detected automatically. If the barcode file is given in combination with automatic barcode detection, the list of given barcodes will be used as whitelist. BarcodeBinning: 1 #Hamming distance binning of close cell barcode sequences. nReadsperCell: 100 #Keep only the cell barcodes with atleast n number of reads. demultiplex: no #produce per-cell demultiplexed bam files.

Options related to counting of reads towards expression profiles

counting_opts: introns: yes #can be set to no for exon-only counting. intronProb: no #perform an estimation of how likely intronic reads are to be derived from mRNA by comparing to intergenic counts. downsampling: 0 #Number of reads to downsample to. This value can be a fixed number of reads (e.g. 10000) or a desired range (e.g. 10000-20000) Barcodes with less than will not be reported. 0 means adaptive downsampling. Default: 0. strand: 0 #Is the library stranded? 0 = unstranded, 1 = positively stranded, 2 = negatively stranded Ham_Dist: 0 #Hamming distance collapsing of UMI sequences. velocyto: no #Would you like velocyto to do counting of intron-exon spanning reads primaryHit: yes #Do you want to count the primary Hits of multimapping reads towards gene expression levels? multi_overlap: no #Do you want to assign reads overlapping to multiple features? fraction_overlap: 0 #minimum required fraction of the read overlapping with the gene for read assignment to genes twoPass: yes #perform basic STAR twoPass mapping

produce stats files and plots?

make_stats: no

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

which_Stage: Filtering

define dependencies program paths

samtools_exec: samtools #samtools executable Rscript_exec: Rscript #Rscript executable STAR_exec: STAR #STAR executable pigz_exec: pigz #pigz executable

below, fqfilter will add a read_layout flag defining SE or PE

cziegenhain commented 2 years ago

Hey,

It's odd that your run didn't seem to find the function from the parallel package. I have pushed a change to github to explicitly call mclapply from the parallel package, please try again with the update.

Some additional comments on your yaml:

Consider relaxing the quality cutoffs for BC & UMI bases depending on the quality of the sequencing
Consider utilizing UMI error correction
I assume Rhapsody produces stranded libraries, set the strand flag accordingly Best, Christoph

mehdipourrostami commented 2 years ago

Dear Christoph,

Thank you for your reply.

I will do another run with the update.

mehdipourrostami commented 2 years ago

Hello Christoph, I have the error again. Warning message: In parallel::mclapply(mapList, function(tt) { : all scheduled cores encountered errors in user code Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found Calls: convert2countM ... .makewide -> unlist -> strsplit -> .handleSimpleError -> h Execution halted Di 10. Mai 23:00:12 CEST 2022 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/mehdi/All_projects/zUMIs_new_version/zUMIs_new_version_run/zUMIs_output/expression/H1_test1.dgecounts.rds', probable reason 'No such file or directory' Execution halted Di 10. Mai 23:00:14 CEST 2022 Di 10. Mai 23:00:14 CEST 2022

cziegenhain commented 2 years ago

Hi,

this reads a bit different to me. I’m going to need the full log to be able to say something, and probably also good if you can send over which files have been created including their size.

mehdipourrostami commented 2 years ago

Hi Christoph,

I was running it again with a new update but unfortunately, I got the error again. Here I attached a screenshot of the files that are created and their size: Screenshot 2022-05-11 at 16 32 48

Mi 11. Mai 13:15:39 CEST 2022 WARNING: The STAR version used for mapping is 2.5.4b and the STAR index was created using the version 20201. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.5.4b. Filtering... Mi 11. Mai 13:58:51 CEST 2022 [1] "21681 barcodes detected." [1] "33982555 reads were assigned to barcodes that do not correspond to intact cells." [1] "Found 11044 daughter barcodes that can be binned into 6926 parent barcodes." [1] "Binned barcodes correspond to 2064361 reads." Mapping... [1] "2022-05-11 14:13:01 CEST" May 11 14:13:02 ..... started STAR run May 11 14:13:02 ..... loading genome May 11 14:13:02 ..... started STAR run May 11 14:13:02 ..... loading genome May 11 14:13:02 ..... started STAR run May 11 14:13:02 ..... loading genome May 11 14:13:13 ..... processing annotations GTF May 11 14:13:19 ..... processing annotations GTF May 11 14:13:19 ..... processing annotations GTF May 11 14:13:22 ..... inserting junctions into the genome indices May 11 14:13:27 ..... inserting junctions into the genome indices May 11 14:13:28 ..... inserting junctions into the genome indices May 11 14:14:36 ..... started 1st pass mapping May 11 14:14:40 ..... started 1st pass mapping May 11 14:14:41 ..... started 1st pass mapping May 11 14:25:21 ..... finished 1st pass mapping May 11 14:25:21 ..... inserting junctions into the genome indices May 11 14:28:12 ..... started mapping May 11 14:30:06 ..... finished 1st pass mapping May 11 14:30:06 ..... inserting junctions into the genome indices May 11 14:30:37 ..... finished 1st pass mapping May 11 14:30:38 ..... inserting junctions into the genome indices May 11 14:33:22 ..... started mapping May 11 14:33:57 ..... started mapping May 11 14:44:40 ..... finished successfully May 11 14:57:19 ..... finished successfully May 11 14:58:01 ..... finished successfully Mi 11. Mai 14:58:57 CEST 2022 Counting... [1] "2022-05-11 14:59:07 CEST" [1] "4.5e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/mehdi/Documents/H1_test2_zUMIs/zUMIs_output//H1_test2.final_annot.gtf" [1] "Annotation loaded!" [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				S H1_test2.filtered.tagged.Aligned.out.bam
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 18
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid64310 ...				Features : 150				Meta-features : 54				Chromosomes/contigs : 22
Process BAM file H1_test2.filtered.tagged.Aligned.out.bam...
Single-end reads are included.
Assign alignments to features...
Total alignments : 362418123
Successfully assigned alignments : 42579 (0.0%)
Running time : 1.75 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				S H1_test2.filtered.tagged.Aligned.out.bam.e ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 18
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid64310 ...				Features : 87				Meta-features : 12				Chromosomes/contigs : 9
Process BAM file H1_test2.filtered.tagged.Aligned.out.bam.ex.featureCo ...
Single-end reads are included.
Assign alignments to features...
Total alignments : 362418123
Successfully assigned alignments : 7912 (0.0%)
Running time : 1.78 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2022-05-11 15:04:02 CEST" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 18 files and 18 in-memory blocks... [1] "2022-05-11 15:16:43 CEST" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 21681 barcodes in this chunk..." Warning message: In parallel::mclapply(mapList, function(tt) { : all scheduled cores encountered errors in user code Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'strsplit': object 'GE' not found Calls: convert2countM ... .makewide -> unlist -> strsplit -> .handleSimpleError -> h Execution halted Mi 11. Mai 16:10:58 CEST 2022 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/mehdi/Documents/H1_test2_zUMIs/zUMIs_output//zUMIs_output/expression/H1_test2.dgecounts.rds', probable reason 'No such file or directory' Execution halted Mi 11. Mai 16:11:00 CEST 2022 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2022-05-11 16:11:00 CEST" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/mehdi/Documents/H1_test2_zUMIs/zUMIs_output//zUMIs_output/expression/H1_test2.dgecounts.rds', probable reason 'No such file or directory' Execution halted Mi 11. Mai 16:11:06 CEST 2022

cziegenhain commented 2 years ago

Hi,

So it seems like it is running fine up until assigning the reads to genes, where virtually no reads match gene features. Hence the downstream error of not finding any gene tags that breaks the pipeline. This can have several reasons depending on your data. First, I would check the alignments in your bam file by samtools flagstat and inspecting a few lines with samtools view.

Best, C

gevro commented 1 year ago

Hi, I am getting the exact same error. Did you figure out the cause?

sdparekh / zUMIs

Error in mclapply #316

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

define a project name that will be used to name output files

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

reference genome setup

output directory

below, you may optionally change default parameters

number of processors to use

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

Options related to counting of reads towards expression profiles

produce stats files and plots?

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

define dependencies program paths

below, fqfilter will add a read_layout flag defining SE or PE