tderrien / FEELnc

FEELnc : FlExible Extraction of LncRNA
GNU General Public License v3.0
79 stars 28 forks source link

Parser::parseGTF => Data Structure returns an empty hash #56

Closed joelnitta closed 2 years ago

joelnitta commented 2 years ago

Hello,

I am trying to run FEELnc_filter.pl and I encounter the Parser::parseGTF => Data Structure returns an empty hash error as follows:

bash-4.2# FEELnc_filter.pl -i d_magna.filtered.gtf -a daphnia_genome.gtf -b transcript_biotype=protein_coding > candidate_lncRNA.gtf
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Filtered transcripts will be available in file: 'd_magna.filtered.feelncfilter.log'
Parsing file 'd_magna.filtered.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
> Filter size (200): 0
> Filter monoexonic (0): 170
> Filter biexonicsize (25): 0
>> Transcripts left after fitler(s): 36582
Parsing file 'daphnia_genome.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
Parser::parseGTF => Data Structure returns an empty hash
Possible reasons:
        *Feature level 'exon' is not present in 3rd field of 'daphnia_genome.gtf'
        *chromosome/seqname (chr) or patch chromosome...
        *Filtering tag/Attributes (--filter|-f) option returns no results
Try --help for help

daphnia_genome.gtf does contain exon annotations in the 3rd field.

I am not sure what the other two possible reasons refer to, or how to check those.

The input files can be downloaded from these dropbox links:

FEELnc v0.2-0 run in docker image quay.io/biocontainers/feelnc:0.2--pl526_0

I would greatly appreciate it if you can help me troubleshoot this.

Thanks!

joelnitta commented 2 years ago

PS: the Possible precedence issue with control flow operator warning shows up even with the test data, so I don't think that has anything to do with the above error.

bash-4.2# FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf -b transcript_biotype=protein_coding > test.gtf
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Filtered transcripts will be available in file: 'transcript_chr38.feelncfilter.log'
Parsing file 'transcript_chr38.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
> Filter size (200): 36
> Filter monoexonic (0): 1265
> Filter biexonicsize (25): 15
>> Transcripts left after fitler(s): 2146
Parsing file 'annotation_chr38.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
38
Intersect fileA:              [----------------------------------------------------------------------------------------------------]
vwucher commented 2 years ago

Hi @joelnitta ,

First, thanks for all the info and the files (it help to debug)! In fact the "error" is in the file daphnia_genome.gtf, the biotype are not as expected and instead of protein_coding, it is mRNA. So you just need to replace protein_coding by mRNA in the command line and it will work (at least for me yes).

Tell us if you have other issues! Bye, Valentin

joelnitta commented 2 years ago

Thanks @vwucher for the prompt reply! Can you please let me know the code you used to fix daphnia_genome.gtf? I have tried changing protein_coding to mRNA but I am still getting the same error.

vwucher commented 2 years ago

Hi,

I didn't fix the file. I just changed your command line by replacing protein_coding by mRNA. Did you tried that?

Bye

joelnitta commented 2 years ago

Ah, now I see what you mean! Yes that fixes it, thanks!

(for anybody else who comes across this, it means using -b transcript_biotype=mRNA in the FEELnc_filter.pl command)