JinsyWang commented 1 year ago

Hi, thank you for your awesome software! I encountered a problem when mapping my Smart-Seq3 data, I run zUMIs.sh -y /home/wangjx/single_cell/abko/zUMIs_yaml_1/abko_1.yaml but only find *.filtered.tagged.unmapped.bam in the output files (R_4.0.2) [wangjx@len1 zUMIs_yaml_1]$ ls abko_1.BCstats.txt abko_1.final_annot.gtf abko_1.zUMIs_runlog.txt abko_1.filtered.tagged.Log.final.out abko_1.run.yaml zUMIs_output abko_1.filtered.tagged.unmapped.bam abko_1.yaml

and directoies under zUMIS_output are empty(eg. expression, stats)

To Reproduce ###########################################

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

###########################################

define a project name that will be used to name output files

project: abko_1

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

sequence_files: file1: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.R1.fastq.gz base_definition:

cDNA(23-150)
UMI(12-19) find_pattern: ATTGCGCAATG file2: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.R2.fastq.gz base_definition:
cDNA(1-150) file3: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.index.fastq.gz base_definition:
BC(1-8)

reference genome setup

reference: STAR_index: /home/wangjx/single_cell/reference/mouse20230421 GTF_file: /home/wangjx/single_cell/reference/Mus_musculus.GRCm39.110.gtf exon_extension: no #extend exons by a certain width? extension_length: 0 #number of bp to extend exons by scaffold_length_min: 0 #minimal scaffold/chromosome length to consider (0 = all)

additional_files: #Optional parameter. It is possible to give additional reference sequences here, eg ERCC.fa

additional_STAR_params: #Optional parameter. you may add custom mapping parameters to STAR here

output directory

out_dir: /home/wangjx/single_cell/abko/zUMIs_yaml_1

###########################################

below, you may optionally change default parameters

###########################################

number of processors to use

num_threads: 6 mem_limit: null #Memory limit in Gigabytes, null meaning unlimited RAM usage.

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

filter_cutoffs: BC_filter: num_bases: 1 phred: 20 UMI_filter: num_bases: 1 phred: 20

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

barcodes: barcode_num: null barcode_file: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.expected_barcodes.txt barcode_sharing: null #Optional for combining several barcode sequences per cell (see github wiki) automatic: yes #Give yes/no to this option. If the cell barcodes should be detected automatically. If the barcode file is given in combination with automatic barcode detection, the list of given barcodes will be used as whitelist. BarcodeBinning: 1 #Hamming distance binning of close cell barcode sequences. nReadsperCell: 100 #Keep only the cell barcodes with atleast n number of reads. demultiplex: no #produce per-cell demultiplexed bam files.

Options related to counting of reads towards expression profiles

counting_opts: introns: yes #can be set to no for exon-only counting. intronProb: no #perform an estimation of how likely intronic reads are to be derived from mRNA by comparing to intergenic counts. downsampling: 0 #Number of reads to downsample to. This value can be a fixed number of reads (e.g. 10000) or a desired range (e.g. 10000-20000) Barcodes with less than will not be reported. 0 means adaptive downsampling. Default: 0. strand: 0 #Is the library stranded? 0 = unstranded, 1 = positively stranded, 2 = negatively stranded Ham_Dist: 0 #Hamming distance collapsing of UMI sequences. velocyto: no #Would you like velocyto to do counting of intron-exon spanning reads primaryHit: yes #Do you want to count the primary Hits of multimapping reads towards gene expression levels? multi_overlap: no #Do you want to assign reads overlapping to multiple features? fraction_overlap: 0 #minimum required fraction of the read overlapping with the gene for read assignment to genes twoPass: yes #perform basic STAR twoPass mapping

produce stats files and plots?

make_stats: yes

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

which_Stage: Filtering

define dependencies program paths

samtools_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/samtools #samtools executable Rscript_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/Rscript #Rscript executable STAR_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/STAR #STAR executable pigz_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/pigz #pigz executable

below, fqfilter will add a read_layout flag defining SE or PE

Desktop (please complete the following information):

CentOS

微信图片_20230819191345

I've tried the inbuilt environment, but got a same output.

cziegenhain commented 1 year ago

Hi,

Please post the verbose of the failed run in order to troubleshoot.

best, C

JinsyWang commented 1 year ago

When sbatch my job, the runlog shows:

You provided these parameters: YAML file: /home/wangjx/single_cell/abko/zUMIs_yaml_1/abko_1.yaml zUMIs directory: /home/wangjx/software/zUMIs STAR executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/STAR samtools executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/samtools pigz executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/pigz Rscript executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/Rscript RAM limit: null zUMIs version 2.9.7e

that's all, I have no idea why it looks incomplete.

When run (R_4.0.2) [wangjx@len1 zUMIs_yaml]$ zUMIs.sh -y /home/wangjx/single_cell/abko/zUMIs_yaml/abko.yaml directly, the terminal shows: You provided these parameters: YAML file: /home/wangjx/single_cell/abko/zUMIs_yaml/abko.yaml zUMIs directory: /home/wangjx/software/zUMIs STAR executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/STAR samtools executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/samtools pigz executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/pigz Rscript executable /home/wangjx/anaconda3/envs/R_4.0.2/bin/Rscript RAM limit: null zUMIs version 2.9.7e

Sat Aug 19 15:23:08 CST 2023 WARNING: The STAR version used for mapping is 2.7.10b and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.10b. Filtering... pigz: skipping: ./name:.gz does not exist pigz: skipping: ./name:.gz does not exist pigz: skipping: ./name:.gz does not exist pigz: skipping: ./name:.gz does not exist pigz: skipping: ./name:.gz does not exist ls: cannot access /home/wangjx/single_cell/abko/zUMIs_yaml/zUMIs_output/.tmpMerge/name:abko: No such file or directory ls: cannot access /home/wangjx/single_cell/abko/zUMIs_yaml/zUMIs_output/.tmpMerge/name:abko: No such file or directory ls: cannot access /home/wangjx/single_cell/abko/zUMIs_yaml/zUMIs_output/.tmpMerge/name:abko: No such file or directory ls: cannot access /home/wangjx/single_cell/abko/zUMIs_yaml/zUMIs_output/.tmpMerge/name:abko: No such file or directory ls: cannot access /home/wangjx/single_cell/abko/zUMIs_yaml/zUMIs_output/.tmpMerge/name:abko*: No such file or directory ^C(R_4.0.2) [wangjx@len1 zUMIs_yaml]$ split: with FILE=/home/wangjx/single_cell/abko/zUMIs_yaml/zUMIs_output/.tmpMerge/reads_for_zUMIs.R2.fastqabkoaa, exit 4 from command: /home/wangjx/anaconda3/envs/R_4.0.2/bin/pigz -p 6 > $FILE.gz

split: with FILE=/home/wangjx/single_cell/abko/zUMIs_yaml/zUMIs_output/.tmpMerge/reads_for_zUMIs.R1.fastqabkoaa, exit 4 from command: /home/wangjx/anaconda3/envs/R_4.0.2/bin/pigz -p 6 > $FILE.gz ^C The process stopped here for 2 hours long and I interrupted it.

JinsyWang commented 1 year ago

Retried using YAML below but got another error: project: abko sequence_files: file1: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.R1.fastq.gz base_definition:

cDNA(23-150)
UMI(12-19) find_pattern: ATTGCGCAATG file2: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.R2.fastq.gz base_definition:
cDNA(1-150) file3: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.index.fastq.gz base_definition: BC(1-8) reference: STAR_index: /home/wangjx/single_cell/reference/mouse20230421 GTF_file: /home/wangjx/single_cell/reference/Mus_musculus.GRCm39.110.gtf exon_extension: no extension_length: 0 scaffold_length_min: 0 additional_files: ~ additional_STAR_params: '--clip3pAdapterSeq CTGTCTCTTATACACATCT' out_dir: /home/wangjx/single_cell/abko/test_2 num_threads: 8 mem_limit: 30 filter_cutoffs: BC_filter: num_bases: 3 phred: 20 UMI_filter: num_bases: 3 phred: 20 barcodes: barcode_num: ~ barcode_file: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.expected_barcodes.txt barcode_sharing: ~ automatic: yes BarcodeBinning: 0 nReadsperCell: 100 demultiplex: no counting_opts: introns: yes intronProb: no downsampling: 0 strand: 0 Ham_Dist: 0 velocyto: no primaryHit: yes multi_overlap: no twoPass: no make_stats: yes which_Stage: Filtering

Here's the verbose: Starting job at Tue Aug 22 19:25:01 CST 2023 Warning: YAML file doesn't include 'pigz_exec' option; setting to 'pigz' Warning: YAML file doesn't include 'STAR_exec' option; setting to 'STAR' Warning: YAML file doesn't include 'Rscript_exec' option; setting to 'Rscript' Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!

You provided these parameters: YAML file: /home/wangjx/single_cell/abko/test_2/test.yaml zUMIs directory: /home/wangjx/software/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 30 zUMIs version 2.9.7e

Tue Aug 22 19:30:47 CST 2023 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Tue Aug 22 19:36:01 CST 2023 [1] "Using intersection between automatic and whitelist." Mapping... [1] "2023-08-22 19:36:06 CST" Aug 22 19:36:08 ..... started STAR run Aug 22 19:36:10 ..... loading genome Aug 22 19:36:42 ..... processing annotations GTF Aug 22 19:36:51 ..... inserting junctions into the genome indices Aug 22 19:38:10 ..... started mapping Aug 22 19:45:00 ..... finished mapping Aug 22 19:45:00 ..... finished successfully Tue Aug 22 19:45:01 CST 2023 Counting... [1] "2023-08-22 19:45:14 CST" Tue Aug 22 19:45:14 CST 2023 [1] "loomR found" Tue Aug 22 19:45:18 CST 2023 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2023-08-22 19:45:18 CST" Tue Aug 22 19:45:22 CST 2023 Job finished at Tue Aug 22 19:45:23 CST 2023

Error in uik(bccount$cellindex, bccount$cs/1000) : Method is not applicable for such a small vector. Please give at least a 5 numbers vector Calls: cellBC -> .cellBarcode_expect -> .FindBCcut -> uik Execution halted Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes.txt")) : File '/home/wangjx/single_cell/abko/test_2/zUMIs_output/abkokept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/wangjx/single_cell/abko/test_2' Execution halted Loading required package: yaml Loading required package: Matrix Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/wangjx/single_cell/abko/test_2/zUMIs_output/expression/abko.dgecounts.rds', probable reason 'No such file or directory' Execution halted Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, : File '/home/wangjx/single_cell/abko/test_2/zUMIs_output/abkokept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/wangjx/single_cell/abko/test_2' Execution halted

I'll try both barcode_num and barcode_file parameter NULL and run again.

JinsyWang commented 1 year ago

I used the modified YAML file： `project: test_1

sequence_files: file1: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.R1.fastq.gz base_definition:

cDNA(23-150)
UMI(12-19) find_pattern: ATTGCGCAATG file2: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.R2.fastq.gz base_definition:
cDNA(1-150) file3: name: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.index.fastq.gz base_definition:
BC(1-8)

reference genome setup

reference: STAR_index: /home/wangjx/single_cell/reference/mouse20230421 GTF_file: /home/wangjx/single_cell/reference/Mus_musculus.GRCm39.110.gtf exon_extension: no scaffold_length_min: 0 #minimal scaffold/chromosome length to consider (0 = all) additional_STAR_params: '--limitSjdbInsertNsj 2000000 --clip3pAdapterSeq CTGTCTCTTATACACATCT'

output directory

out_dir: /home/wangjx/single_cell/abko/test_1

number of processors to use

num_threads: 30 mem_limit: 100 #Memory limit in Gigabytes, null meaning unlimited RAM usage.

filter_cutoffs: BC_filter: num_bases: 3 phred: 20 UMI_filter: num_bases: 3 phred: 20

barcodes: barcode_num: NULL barcode_file: /home/wangjx/single_cell/abko/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/V3-h2b-p9-12/reads_for_zUMIs.expected_barcodes.txt barcode_sharing: null automatic: no BarcodeBinning: 0 #Hamming distance binning of close cell barcode sequences. nReadsperCell: 100 #Keep only the cell barcodes with atleast n number of reads. demultiplex: yes #produce per-cell demultiplexed bam files.

Options related to counting of reads towards expression profiles

counting_opts: introns: yes #can be set to no for exon-only counting. intronProb: no #perform an estimation of how likely intronic reads are to be derived from mRNA by comparing to intergenic counts. downsampling: 0 #Number of reads to downsample to. This value can be a fixed number of reads (e.g. 10000) or a desired range (e.g. 10000-20000) Barcodes with less than will not be reported. 0 means adaptive downsampling. Default: 0. strand: 1 #Is the library stranded? 0 = unstranded, 1 = positively stranded, 2 = negatively stranded Ham_Dist: 0 #Hamming distance collapsing of UMI sequences. velocyto: no #Would you like velocyto to do counting of intron-exon spanning reads primaryHit: yes #Do you want to count the primary Hits of multimapping reads towards gene expression levels? multi_overlap: no #Do you want to assign reads overlapping to multiple features? fraction_overlap: 0 #minimum required fraction of the read overlapping with the gene for read assignment to genes twoPass: yes #perform basic STAR twoPass mapping

produce stats files and plots?

make_stats: yes

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

which_Stage: Filtering

define dependencies program paths

below, fqfilter will add a read_layout flag defining SE or PE

samtools_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/samtools pigz_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/pigz STAR_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/STAR Rscript_exec: /home/wangjx/anaconda3/envs/R_4.0.2/bin/Rscript zUMIs_directory: /home/wangjx/software/zUMIs read_layout: PE

~ ` and obtained the following output.：

`Starting job at Tue Aug 22 20:47:29 CST 2023 Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!

You provided these parameters: YAML file: /home/wangjx/single_cell/abko/test_1/zUMIs.yaml zUMIs directory: /home/wangjx/software/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 100 zUMIs version 2.9.7e

Tue Aug 22 20:47:37 CST 2023 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Tue Aug 22 20:53:09 CST 2023 [1] " reads were assigned to barcodes that do not correspond to intact cells." Mapping... [1] "2023-08-22 20:53:12 CST" Aug 22 20:53:13 ..... started STAR run Aug 22 20:53:16 ..... loading genome Aug 22 20:53:13 ..... started STAR run Aug 22 20:53:16 ..... loading genome Aug 22 20:53:13 ..... started STAR run Aug 22 20:53:16 ..... loading genome Aug 22 20:53:52 ..... processing annotations GTF Aug 22 20:53:52 ..... processing annotations GTF Aug 22 20:53:52 ..... processing annotations GTF Aug 22 20:54:01 ..... inserting junctions into the genome indices Aug 22 20:54:01 ..... inserting junctions into the genome indices Aug 22 20:54:01 ..... inserting junctions into the genome indices Aug 22 20:55:21 ..... started 1st pass mapping Aug 22 20:55:22 ..... started 1st pass mapping Aug 22 20:55:22 ..... started 1st pass mapping Aug 22 20:59:46 ..... finished 1st pass mapping Aug 22 20:59:47 ..... inserting junctions into the genome indices Aug 22 21:00:00 ..... finished 1st pass mapping Aug 22 21:00:01 ..... inserting junctions into the genome indices Aug 22 21:00:05 ..... finished 1st pass mapping Aug 22 21:00:06 ..... inserting junctions into the genome indices Aug 22 21:01:50 ..... started mapping Aug 22 21:02:04 ..... started mapping Aug 22 21:02:10 ..... started mapping Aug 22 21:06:39 ..... finished mapping Aug 22 21:06:42 ..... finished successfully Aug 22 21:07:40 ..... finished mapping Aug 22 21:07:42 ..... finished successfully Aug 22 21:07:43 ..... finished mapping Aug 22 21:07:45 ..... finished successfully Tue Aug 22 21:07:55 CST 2023 Counting... [1] "2023-08-22 21:08:06 CST" [1] "4.5e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/wangjx/single_cell/abko/test_1/test_1.final_annot.gtf" [1] "Annotation loaded!" [1] "Preparing Smart-seq3 data for stranded gene assignment..." [1] "2023-08-22 21:09:21 CST" [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				P test_1.filtered.tagged.Aligned.out.bam.int ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 30
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid13268 ...				Features : 295108				Meta-features : 56941				Chromosomes/contigs : 39
Process BAM file test_1.filtered.tagged.Aligned.out.bam.internal.bam...
Paired-end reads are included.
Assign alignments (paired-end) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Pairing up the read pairs.

Total alignments : 13082847
Successfully assigned alignments : 10458718 (79.9%)
Running time : 0.35 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				P test_1.filtered.tagged.Aligned.out.bam.UMI ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 30
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid13268 ...				Features : 295108				Meta-features : 56941				Chromosomes/contigs : 39
Process BAM file test_1.filtered.tagged.Aligned.out.bam.UMI.bam...
Paired-end reads are included.
Strand specific : stranded
Assign alignments (paired-end) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Pairing up the read pairs.

Total alignments : 4079629
Successfully assigned alignments : 3068272 (75.2%)
Running time : 0.11 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				P test_1.filtered.tagged.Aligned.out.bam.int ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 30
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid13268 ...				Features : 221006				Meta-features : 28989				Chromosomes/contigs : 32
Process BAM file test_1.filtered.tagged.Aligned.out.bam.internal.bam.e ...
Paired-end reads are included.
Assign alignments (paired-end) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Pairing up the read pairs.

Total alignments : 13082847
Successfully assigned alignments : 1575238 (12.0%)
Running time : 0.35 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\						Input files : 1 BAM file				P test_1.filtered.tagged.Aligned.out.bam.UMI ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)

Dir for temp files : .
Threads : 30
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1

Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\						Load annotation file .Rsubread_UserProvidedAnnotation_pid13268 ...				Features : 221006				Meta-features : 28989				Chromosomes/contigs : 32
Process BAM file test_1.filtered.tagged.Aligned.out.bam.UMI.bam.ex.fea ...
Paired-end reads are included.
Strand specific : stranded
Assign alignments (paired-end) to features...

WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).

Pairing up the read pairs.

Total alignments : 4079629
Successfully assigned alignments : 234963 (5.8%)
Running time : 0.11 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2023-08-22 21:24:31 CST" [1] "Coordinate sorting final bam file..." [1] "2023-08-22 21:25:16 CST" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 1 barcodes in this chunk..." [1] "Demultiplexing output bam file by cell barcode..." [1] "Using python implementation to demultiplex." [1] "2023-08-22 21:28:44 CST" [1] "Demultiplexing zUMIs bam file..." [1] "Demultiplexing complete." [1] "2023-08-22 21:29:34 CST" [1] "2023-08-22 21:29:34 CST" [1] "I am done!! Look what I produced.../home/wangjx/single_cell/abko/test_1/zUMIs_output/" used (Mb) gc trigger (Mb) max used (Mb) Ncells 7417567 396.2 21098948 1126.9 26373684 1408.6 Vcells 82593600 630.2 199813746 1524.5 199813746 1524.5 Tue Aug 22 21:29:36 CST 2023 [1] "loomR found"

		0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%
	0%
======================================================================	100%Tue Aug 22 21:29:45 CST 2023

Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2023-08-22 21:29:45 CST" [1] "Counting UMI fragments..." [1] "4.5e+08 Reads per chunk" [1] "Extracting reads from bam file(s)..." [1] "Working on chunk 1" used (Mb) gc trigger (Mb) max used (Mb) Ncells 4419719 236.1 8421158 449.8 8375413 447.3 Vcells 41823431 319.1 202673638 1546.3 253332305 1932.8 Tue Aug 22 21:32:01 CST 2023 Job finished at Tue Aug 22 21:32:02 CST 2023 and the errout: slurmstepd: error: _is_a_lwp: open() /proc/12596/status failed: No such file or directory Warning message: as_quosure() requires an explicit environment as of rlang 0.3.0. Please supply env. This warning is displayed once per session. [bam_sort_core] merging from 0 files and 30 in-memory blocks... Loading required package: yaml Loading required package: Matrix Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools `

Although I got downstream file, the unmapped.bam still large

It appears that the mapping step still did not succeed. Are there any other adjustments I need to make? 微信图片_20230822214218

sdparekh / zUMIs

Problems in mapping Smart-Seq3 data #368

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

define a project name that will be used to name output files

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

reference genome setup

additional_files: #Optional parameter. It is possible to give additional reference sequences here, eg ERCC.fa

additional_STAR_params: #Optional parameter. you may add custom mapping parameters to STAR here

output directory

below, you may optionally change default parameters

number of processors to use

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

Options related to counting of reads towards expression profiles

produce stats files and plots?

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

define dependencies program paths

below, fqfilter will add a read_layout flag defining SE or PE

reference genome setup

output directory

number of processors to use

Options related to counting of reads towards expression profiles

produce stats files and plots?

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

define dependencies program paths

below, fqfilter will add a read_layout flag defining SE or PE