Closed loverlyday closed 3 years ago
@loverlyday Hi ,can you tell me the process of zUMIs? I run it many times,but the software always give me a error information as follow: Error: Execution halted YAML file has an error. Look at the zUMIs_YAMLerror.log or contact developers.
can u give me your YAML file to refer? and I want to know whether you use the sequencing file(Smartseq3.Fibroblasts.NovaSeq.R1.fastq.gz,Smartseq3.Fibroblasts.NovaSeq.R2.fastq.gz,Smartseq3.Fibroblasts.NovaSeq.I1.fastq.gz,Smartseq3.Fibroblasts.NovaSeq.I2.fastq.gz),gtf file, STAR index?
Hope your early reply.
the yaml file is like this: project: Smartseq3_Fibroblasts sequence_files: file1: name: /home/xxx/temp/smart-seq3_mouse/Smartseq3.Fibroblasts.NovaSeq.R1.fastq.gz base_definition:
the squence file were download from the paper link E-MTAB-8735 .the gtf file is gencode.vM18.annotation.sorted.gtf downlaod from gencode the STAR index using STAR_2.7.3a. By the way are you the author of the ss3_isofrom.py?
@PingChen-Angela is the author of the ss3_isoform scripts and will look at your issue shortly.
@Jianye-Chang I have answered your question on the zUMIs github. The YAML posted above lacks the correct indentation, Please check carefully the documentation and examples: https://github.com/sdparekh/zUMIs/wiki/Usage#setup-using-the-yaml-config-file https://github.com/sandberg-lab/Smart-seq3/blob/master/allele_level_expression/mouse_cross.yaml https://www.protocols.io/view/smart-seq3-protocol-bcq4ivyw?step=18
@cziegenhain @loverlyday Thanks!!!
@loverlyday sorry,by the way, Do you use this method "git" to download zUMIs? I download it's tar.gz file.Emmmm,too much error information
@Jianye-Chang yes ,i use git download zUMIs .you should check the zUMIs dependence carefully. then you will install successful.
@Jianye-Chang yes ,i use git download zUMIs .you should check the zUMIs dependence carefully. then you will install successful.
Oh my god,my server download it so slowly,ok ,if i download it by git,no configuration dependencies are required?Just run it?Thx
@Jianye-Chang No,if you download by git,you should prepare the dependence before you install it. if not ,you can't run zUMIs successfully.
Hi,
It is possible to run zUMIs with or without manual installation of the dependencies. The usage and potential installation is described in detail: https://github.com/sdparekh/zUMIs#quickstart https://github.com/sdparekh/zUMIs/wiki/Installation#dependencies
@loverlyday Hi,excuse me.Another problem arises...I run smart-seq3 data,but when the software run,it give me a information as follow:
[1] "2021-01-25 16:53:17 CST"
[1] "Coordinate sorting intermediate bam file..."
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
[1] "2021-01-25 17:00:58 CST"
[1] "Hamming distance collapse in barcode chunk 1 out of 1"
[1] "Splitting data for multicore hamming distance collapse..."
[1] "Setting up multicore cluster & generating molecule mapping tables ..."
[1] "Finished multi-threaded hamming distances"
[1] "Correcting UMI barcode tags..."
Traceback (most recent call last):
File "/md01/changjy/software/zUMIs/correct_UBtag.py", line 123, in
This is very odd.Because “Smart-seq3kept_barcodes_binned.txt” Shouldn't this software be generating this file by himself? But the infomation tell me that it doesn't have this document.....What????
Hope your reply.Thx
Please do not post the same comments in the zUMIs GitHub and here. I will reply to you when time permits.
Hi, I want to repeat the Fiborblasts.novaSeq data ,i use the zUMIs to get the bam file .then i want to use the ss3_isofrom.py to get the isoform. but i get a problem.
Traceback (most recent call last): File "/home/XXX/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/XXX/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 90, in get_aligned_reads nreads = len([r_idx for r_idx, x in enumerate(r_iterator) if x.flag in self.strand_flags[self.strand]]) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 90, in nreads = len([r_idx for r_idx, x in enumerate(r_iterator) if x.flag in self.strand_flags[self.strand]]) KeyError: 1
my gtf file like this:
description: evidence-based annotation of the mouse genome (GRCm38), version M18 (Ensembl 93)
provider: GENCODE
contact: gencode-help@ebi.ac.uk
format: gtf
date: 2018-07-02
chr1 HAVANA gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "RP23-271O17.1"; level 2; havana_gene "OTTMUSG00000049935.1"; chr1 HAVANA transcript 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; ranscript_name "RP23-271O17.1-001"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";
the temp and temp_merged.bed in .tempDir like this : _temp_xxx: chr1 133321874 133321976 . . +
_temp_xxx_merged: chr1 133619870 133621801 chr1 133622116 133622184 chr1 133622983 133623150
and the exon file like this: chr1 133619870 133621801 1 chr1 133622116 133622184 2 chr1 133622983 133623150 3 chr1 133624482 133624810 4 chr1 133625847 133625930 5 chr1 133626766 133627270 6
in you code you select exon.iloc[0,3] as strand,but when you define you strand_flags dict like this: self.strand_flags = {'+': [99, 147], '-': [83, 163]}
so ,what should i do to fix the problem ,or there is something wrong in my gtf file ?
Hi @loverlyday,
I have got the same error flags as what you posted in the first thread. Any chance you can kindly share your experience of fixing so? Thanks!
Hi, I want to repeat the Fiborblasts.novaSeq data ,i use the zUMIs to get the bam file .then i want to use the ss3_isofrom.py to get the isoform. but i get a problem.
Traceback (most recent call last): File "/home/XXX/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/XXX/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 90, in get_aligned_reads nreads = len([r_idx for r_idx, x in enumerate(r_iterator) if x.flag in self.strand_flags[self.strand]]) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 90, in nreads = len([r_idx for r_idx, x in enumerate(r_iterator) if x.flag in self.strand_flags[self.strand]]) KeyError: 1
my gtf file like this:
description: evidence-based annotation of the mouse genome (GRCm38), version M18 (Ensembl 93)
provider: GENCODE
contact: gencode-help@ebi.ac.uk
format: gtf
date: 2018-07-02
chr1 HAVANA gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "RP23-271O17.1"; level 2; havana_gene "OTTMUSG00000049935.1"; chr1 HAVANA transcript 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; ranscript_name "RP23-271O17.1-001"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";
the temp and temp_merged.bed in .tempDir like this : _temp_xxx: chr1 133321874 133321976 . . +
_temp_xxx_merged: chr1 133619870 133621801 chr1 133622116 133622184 chr1 133622983 133623150
and the exon file like this: chr1 133619870 133621801 1 chr1 133622116 133622184 2 chr1 133622983 133623150 3 chr1 133624482 133624810 4 chr1 133625847 133625930 5 chr1 133626766 133627270 6
in you code you select exon.iloc[0,3] as strand,but when you define you strand_flags dict like this: self.strand_flags = {'+': [99, 147], '-': [83, 163]}
so ,what should i do to fix the problem ,or there is something wrong in my gtf file ?
Hi @loverlyday and @kwglam, I found the problem might come from bedtools versions. I used bedtools v2.26.0 and it gives me strand information if I run the command as below.
$ sort -k1,1 -k2,2n _temp_ENSMUSG00000092329.bed | bedtools merge -s -d -1 -i - chr8 119910840 119910939 + chr8 124295357 124295451 + chr8 124305514 124305668 + chr8 124323671 124323770 + chr8 124323991 124324059 + chr8 124324270 124324336 + chr8 124328091 124328213 + chr8 124329712 124329800 + chr8 124332008 124332096 + chr8 124332567 124332671 + chr8 124334274 124334401 + chr8 124336573 124336666 + chr8 124337261 124337345 + chr8 124338442 124338569 + chr8 124340818 124340938 + chr8 124343302 124345722 +
I tested another bedtools version v2.23.0. It does not output strand info… See below: $ sort -k1,1 -k2,2n _temp_ENSMUSG00000092329.bed | bedtools merge -s -d -1 -i - chr8 119910840 119910939 chr8 124295357 124295451 chr8 124305514 124305668 chr8 124323671 124323770 chr8 124323991 124324059 chr8 124324270 124324336 chr8 124328091 124328213 chr8 124329712 124329800 chr8 124332008 124332096 chr8 124332567 124332671 chr8 124334274 124334401 chr8 124336573 124336666 chr8 124337261 124337345 chr8 124338442 124338569 chr8 124340818 124340938 chr8 124343302 124345722
Hope it solves your issue. :=)
Hi, I want to repeat the Fiborblasts.novaSeq data ,i use the zUMIs to get the bam file .then i want to use the ss3_isofrom.py to get the isoform. but i get a problem.
Traceback (most recent call last): File "/home/XXX/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/XXX/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 90, in get_aligned_reads nreads = len([r_idx for r_idx, x in enumerate(r_iterator) if x.flag in self.strand_flags[self.strand]]) File "/home/XXX/temp/ERR3835349/pyModule/informative_reads.py", line 90, in
nreads = len([r_idx for r_idx, x in enumerate(r_iterator) if x.flag in self.strand_flags[self.strand]])
KeyError: 1
my gtf file like this:
description: evidence-based annotation of the mouse genome (GRCm38), version M18 (Ensembl 93)
provider: GENCODE
contact: gencode-help@ebi.ac.uk
format: gtf
date: 2018-07-02
chr1 HAVANA gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "RP23-271O17.1"; level 2; havana_gene "OTTMUSG00000049935.1"; chr1 HAVANA transcript 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; ranscript_name "RP23-271O17.1-001"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";
the temp and temp_merged.bed in .tempDir like this : _temp_xxx: chr1 133321874 133321976 . . +
_temp_xxx_merged: chr1 133619870 133621801 chr1 133622116 133622184 chr1 133622983 133623150
and the exon file like this: chr1 133619870 133621801 1 chr1 133622116 133622184 2 chr1 133622983 133623150 3 chr1 133624482 133624810 4 chr1 133625847 133625930 5 chr1 133626766 133627270 6
in you code you select exon.iloc[0,3] as strand,but when you define you strand_flags dict like this: self.strand_flags = {'+': [99, 147], '-': [83, 163]}
so ,what should i do to fix the problem ,or there is something wrong in my gtf file ?