ratschlab / spladder

Tool for the detection and quantification of alternative splicing events from RNA-Seq data.
Other
103 stars 33 forks source link

errors occurred in gff3 files #34

Closed jiahsinhuang closed 8 years ago

jiahsinhuang commented 8 years ago

Hello, I have tried SplAdder and got error messages:

gff3 file from ensembl (Drosophila_melanogaster.BDGP6.84.chr.gff3)

$ python spladder.py -a ../../Drosophila_melanogaster/ensembl/Drosophila_melanogaster.BDGP6.84.chr.gff3 -b ../../HISAT2_Stringtie_Ballgown/SRR1055723.hisat2.sorted.bam -o ../../spladder -T n Traceback (most recent call last): File "spladder.py", line 322, in spladder() File "spladder.py", line 144, in spladder (genes, CFG) = init.init_genes_gff3(CFG['anno_fname'], CFG, CFG['anno_fname'] + '.pickle') File "/Users/jhhuang/IIS_jhh/TEAS_Dmel/spladder/python/modules/init.py", line 256, in init_genes_gff3 gene_id = trans2gene[trans_id] KeyError: 'transcript:FBtr0309810'

gff3 file from ensembl (Drosophila_melanogaster.BDGP6.84.gff3)

$ python spladder.py -a ../../Drosophila_melanogaster/ensembl/Drosophila_melanogaster.BDGP6.84.gff3 -b ../../HISAT2_Stringtie_Ballgown/SRR1055723.hisat2.sorted.bam -o ../../spladder -T n Traceback (most recent call last): File "spladder.py", line 322, in spladder() File "spladder.py", line 144, in spladder (genes, CFG) = init.init_genes_gff3(CFG['anno_fname'], CFG, CFG['anno_fname'] + '.pickle') File "/Users/jhhuang/IIS_jhh/TEAS_Dmel/spladder/python/modules/init.py", line 258, in init_genes_gff3 t_idx = genes[gene_id].transcripts.index(trans_id) KeyError: 'gene:FBgn0085737'

Please advise me to solve the errors. Thank you.

akahles commented 8 years ago

Hi, Thanks for trying out SplAdder. I made annotation parsing a bit more robust and pushed a fix to the development branch. Let me know whether this works for you. Cheers, Andre

jiahsinhuang commented 8 years ago

Hi, Andre. Thanks for your response. The annotation parsing problem is solved although I got another error message as below.

$ python /Users/jhhuang/IIS_jhh/GitHub/spladder/python/spladder.py -a Drosophila_melanogaster/ensembl/Drosophila_melanogaster.BDGP6.84.chr.gff3 -b HISAT2_Stringtie_Ballgown/SRR1055723.hisat2.sorted.bam -o spladder_AS/ -T n

Augmenting splice graphs.

Generating splice graph ... ...done.

Loading introns from file ... Traceback (most recent call last): File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/spladder.py", line 323, in spladder() File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/spladder.py", line 224, in spladder spladder_core(CFG) File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/modules/core/spladdercore.py", line 21, in spladder_core genes = gen_graphs(genes, CFG) File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/modules/core/gen_graphs.py", line 83, in gen_graphs introns = get_intron_list(genes, CFG) File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/modules/reads.py", line 431, in get_intron_list [intron_list_tmp] = add_reads_from_bam(gg, CFG['bam_fnames'], ['intron_list'], CFG['read_filter'], CFG['var_aware'], CFG['primary_only'], CFG['ignore_mismatch_tag']) File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/modules/reads.py", line 155, in add_reads_from_bam (introns, spliced_coverage) = get_all_data(blocks[b], filenames, mapped=False, filter=filter, var_aware=var_aware, primary_only=primary_only, no_mm=no_mm) File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/modules/reads.py", line 314, in get_all_data (coverage_tmp, introns_tmp) = get_reads(fname, contig_name, block.start, block.stop, strand, filter, mapped, spliced, var_aware, collapse, primary_only, no_mm) File "/Users/jhhuang/IIS_jhh/GitHub/spladder/python/modules/reads.py", line 40, in get_reads for read in infile.fetch(chr_name, start, stop, until_eof=True): File "pysam/calignmentfile.pyx", line 878, in pysam.calignmentfile.AlignmentFile.fetch (pysam/calignmentfile.c:11079) File "pysam/calignmentfile.pyx", line 1660, in pysam.calignmentfile.IteratorRowRegion.init (pysam/calignmentfile.c:18725) ValueError: no index available for iteration

akahles commented 8 years ago

Are your BAM files indexed?

jiahsinhuang commented 8 years ago

Cool! I generated the index file with samtools and Spladder works! Thank you very much.

akahles commented 8 years ago

The convention is to call samtools index with the bam filename as the only argument:

samtools index SRR1055723.hisat2.sorted.bam 

This will create a file SRR1055723.hisat2.sorted.bam.bai. SplAdder will automatically use this index when called with SRR1055723.hisat2.sorted.bam.

CarinaCornejo commented 5 years ago

Hi, Thanks for trying out SplAdder. I made annotation parsing a bit more robust and pushed a fix to the development branch. Let me know whether this works for you. Cheers, Andre

Hello,

I am trying to use SplAdder and I have the same error. I tried replacing the files gene.py, spladder.py, init.py and settings.py as suggested above but it didn't solve it.

Traceback (most recent call last): File "./bin/spladder/python/spladder.py", line 323, in spladder() File "./bin/spladder/python/spladder.py", line 145, in spladder (genes, CFG) = init.init_genes_gff3(CFG['anno_fname'], CFG, CFG['anno_fname'] + '.pickle') File "/data1/paola/bin/spladder/python/modules/init.py", line 262, in init_genes_gff3 t_idx = genes[gene_id].transcripts.index(trans_id) KeyError: 'rna13'

I am using the annotation file GCF_000001735.4_TAIR10.1_genomic.gff from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1

Any help will be appreciated

akahles commented 5 years ago

Dear CarinaCornejo,

It looks like SplAdder has difficulties with the category primary_transcript in your file. I will have a look, but will only be able to fix it early next week. A possible workaround for you might be to just remove all the lines containing primary_transcript, e.g. like so:

grep -v primary_transcript GCF_000001735.4_TAIR10.1_genomic.gff > GCF_000001735.4_TAIR10.1_genomic.filtered.gff

Cheers, Andre

CarinaCornejo commented 5 years ago

Dear Andre,

thanks for the suggestion but I am getting the same error with the filtered file.