the result of example data is different

zjwang6 commented 3 years ago

Hello, I use the offering example data, but the result is not same to yours. And it not report any eroor. follow is the file of Example.filtered.tagged.Log.final.out and runExample1.yaml ` Started job on | Aug 31 17:35:07 Started mapping on | Aug 31 17:35:11 Finished on | Aug 31 17:37:11 Mapping speed, Million of reads per hour | 25.60

                      Number of input reads |       853296
                  Average input read length |       50
                                UNIQUE READS:
               Uniquely mapped reads number |       129079
                    Uniquely mapped reads % |       15.13%
                      Average mapped length |       48.44
                   Number of splices: Total |       6212
        Number of splices: Annotated (sjdb) |       688
                   Number of splices: GT/AG |       5670
                   Number of splices: GC/AG |       97
                   Number of splices: AT/AC |       2
           Number of splices: Non-canonical |       443
                  Mismatch rate per base, % |       5.05%
                     Deletion rate per base |       0.01%
                    Deletion average length |       1.41
                    Insertion rate per base |       0.10%
                   Insertion average length |       2.06
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       79679
         % of reads mapped to multiple loci |       9.34%
    Number of reads mapped to too many loci |       130
         % of reads mapped to too many loci |       0.02%
                              UNMAPPED READS:

                      Number of input reads |       853296
                  Average input read length |       50
                                UNIQUE READS:
               Uniquely mapped reads number |       129079
                    Uniquely mapped reads % |       15.13%
                      Average mapped length |       48.44
                   Number of splices: Total |       6212
        Number of splices: Annotated (sjdb) |       688
                   Number of splices: GT/AG |       5670
                   Number of splices: GC/AG |       97
                   Number of splices: AT/AC |       2
           Number of splices: Non-canonical |       443
                  Mismatch rate per base, % |       5.05%
                     Deletion rate per base |       0.01%
                    Deletion average length |       1.41
                    Insertion rate per base |       0.10%
                   Insertion average length |       2.06
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       79679
         % of reads mapped to multiple loci |       9.34%
    Number of reads mapped to too many loci |       130
         % of reads mapped to too many loci |       0.02%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 643952 % of reads unmapped: too short | 75.47% Number of reads unmapped: other | 456 % of reads unmapped: other | 0.05% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00% project: Example sequence_files: file1: name: /public/home/***/biosoft/scrna-seq/zUMIs/zUMIs/sample_20210831/barcoderead_HEK.1mio.fq.gz base_definition:

BC(1-6)
UMI(7-16) file2: name: /public/home/***/biosoft/scrna-seq/zUMIs/zUMIs/sample_20210831/cDNAread_HEK.1mio.fq.gz base_definition:
cDNA(1-50) reference: STAR_index: /public/home//biosoft/scrna-seq/zUMIs/zUMIs/sample_20210831/chr22/hg38_chr22_STAR7 GTF_file: /public/home//biosoft/scrna-seq/zUMIs/zUMIs/sample_20210831/chr22/GRCh38.95.chr22.gtf additional_STAR_params: '' additional_files: ~ out_dir: /public/home//biosoft/zUMIs/sample_20210831/example_output num_threads: 20 mem_limit: 20 filter_cutoffs: BC_filter: num_bases: 1 phred: 20 UMI_filter: num_bases: 1 phred: 20 barcodes: barcode_num: ~ barcode_file: ~ automatic: yes BarcodeBinning: 0 nReadsperCell: 100 counting_opts: introns: yes downsampling: '0' strand: 0 Ham_Dist: 0 velocyto: no primaryHit: yes twoPass: no make_stats: yes which_Stage: Filtering samtools_exec: samtools pigz_exec: pigz STAR_exec: STAR Rscript_exec: Rscript zUMIs_directory: /public/home//biosoft/zUMIs`

can you give me some advice about the difference of the result. thank you.

cziegenhain commented 3 years ago

Should be ok, I last updated the example run 2 years ago so probably an older version of STAR and many other changes since then.

zjwang6 commented 3 years ago

Thank you for your reply. I took a carefully look, and found the gff file only has infomation of one chromosome.

zjwang6 commented 3 years ago

Hello, My data is obtained through the smart-seq3 method, one data per cell, do not need barcode. In this case, how to set the parameters. Hope to get your guidance, thank you very much.

cziegenhain commented 3 years ago

it is best to start with data that has not been demultiplexed. if you only have the data as 1 set of fastq per cells, follow this guide: https://github.com/sdparekh/zUMIs/wiki/Starting-from-demultiplexed-fastq-files

zjwang6 commented 2 years ago

Hello, Dear author, I try to use zUMI to process my own smart-seq3 data, and recombine fastq files and generate an arbitrary index sequence. But I still got a bug. Next is the file of err report and yaml. zumi.err.txt runExample1.yaml.txt

I try to find the reason of bug, I found the problem occur in script of barcodeIDFUN.R. But i do not know why. Can you give me some advice. Thank you.

cziegenhain commented 2 years ago

Since the arbitrary barcodes will all perfectly match, simply set BarcodeBinning: 0.

Best, Christoph

zjwang6 commented 2 years ago

Thank you very much for your prompt reply

Best wishes

sdparekh / zUMIs

the result of example data is different #282