sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

the result of example data is different #282

Closed zjwang6 closed 2 years ago

zjwang6 commented 3 years ago

Hello, I use the offering example data, but the result is not same to yours. And it not report any eroor. follow is the file of Example.filtered.tagged.Log.final.out and runExample1.yaml ` Started job on | Aug 31 17:35:07 Started mapping on | Aug 31 17:35:11 Finished on | Aug 31 17:37:11 Mapping speed, Million of reads per hour | 25.60

                      Number of input reads |       853296
                  Average input read length |       50
                                UNIQUE READS:
               Uniquely mapped reads number |       129079
                    Uniquely mapped reads % |       15.13%
                      Average mapped length |       48.44
                   Number of splices: Total |       6212
        Number of splices: Annotated (sjdb) |       688
                   Number of splices: GT/AG |       5670
                   Number of splices: GC/AG |       97
                   Number of splices: AT/AC |       2
           Number of splices: Non-canonical |       443
                  Mismatch rate per base, % |       5.05%
                     Deletion rate per base |       0.01%
                    Deletion average length |       1.41
                    Insertion rate per base |       0.10%
                   Insertion average length |       2.06
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       79679
         % of reads mapped to multiple loci |       9.34%
    Number of reads mapped to too many loci |       130
         % of reads mapped to too many loci |       0.02%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Example.filtered.tagged.Log.final.out...skipping... Started job on | Aug 31 17:35:07 Started mapping on | Aug 31 17:35:11 Finished on | Aug 31 17:37:11 Mapping speed, Million of reads per hour | 25.60

                      Number of input reads |       853296
                  Average input read length |       50
                                UNIQUE READS:
               Uniquely mapped reads number |       129079
                    Uniquely mapped reads % |       15.13%
                      Average mapped length |       48.44
                   Number of splices: Total |       6212
        Number of splices: Annotated (sjdb) |       688
                   Number of splices: GT/AG |       5670
                   Number of splices: GC/AG |       97
                   Number of splices: AT/AC |       2
           Number of splices: Non-canonical |       443
                  Mismatch rate per base, % |       5.05%
                     Deletion rate per base |       0.01%
                    Deletion average length |       1.41
                    Insertion rate per base |       0.10%
                   Insertion average length |       2.06
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       79679
         % of reads mapped to multiple loci |       9.34%
    Number of reads mapped to too many loci |       130
         % of reads mapped to too many loci |       0.02%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 643952 % of reads unmapped: too short | 75.47% Number of reads unmapped: other | 456 % of reads unmapped: other | 0.05% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00% project: Example sequence_files: file1: name: /public/home/***/biosoft/scrna-seq/zUMIs/zUMIs/sample_20210831/barcoderead_HEK.1mio.fq.gz base_definition:

can you give me some advice about the difference of the result. thank you.

cziegenhain commented 3 years ago

Should be ok, I last updated the example run 2 years ago so probably an older version of STAR and many other changes since then.

zjwang6 commented 3 years ago

Thank you for your reply. I took a carefully look, and found the gff file only has infomation of one chromosome.

zjwang6 commented 3 years ago

Hello, My data is obtained through the smart-seq3 method, one data per cell, do not need barcode. In this case, how to set the parameters. Hope to get your guidance, thank you very much.

cziegenhain commented 3 years ago

it is best to start with data that has not been demultiplexed. if you only have the data as 1 set of fastq per cells, follow this guide: https://github.com/sdparekh/zUMIs/wiki/Starting-from-demultiplexed-fastq-files

zjwang6 commented 2 years ago

Hello, Dear author, I try to use zUMI to process my own smart-seq3 data, and recombine fastq files and generate an arbitrary index sequence. But I still got a bug. Next is the file of err report and yaml. zumi.err.txt runExample1.yaml.txt

I try to find the reason of bug, I found the problem occur in script of barcodeIDFUN.R. But i do not know why. Can you give me some advice. Thank you. image

cziegenhain commented 2 years ago

Since the arbitrary barcodes will all perfectly match, simply set BarcodeBinning: 0.

Best, Christoph

zjwang6 commented 2 years ago

Thank you very much for your prompt reply

Best wishes