ORFmine is an open-source tool for identifying and analyzing all Open Reading Frames (ORFs) in genomic data, focusing on their sequences, structures, evolution and translation activities.
In some gff files are features that cover most of the track.
For example : GCF_000247795.1
In the related gff file (enclosed), there is a feature named "match" that fully overlaps with the first chromosome
NC_032650.1 RefSeq region 1 161108492 . + . ID=NC_032650.1:1..161108492;Dbxref=taxon:9915;Name=1;breed=Nelore;chromosome=1;country=Brazil;gb-synonym=Bos taurus indicus;gbkey=Src;genome=chromosome;isolate=QUIL7308;mol_type=genomic DNA;note=animal owned by Agropecuaria Quilombo Inc.;sex=male;tissue-type=peripheral blood mononuclear cells
line num 37235:
NC_032650.1 RefSeq match 1 161108492 . + . ID=aln0;Target=NC_032650.1 1 161108492 +;gap_count=0;num_mismatch=0;pct_coverage=100;pct_identity_gap=100
In consequence orfget is not able to define any pure intergenic ORF :
Would it be possible as a preliminary step in orftrack, to exclude features whose region coverage exceeds lets say 90% to avoid this behavior ?
Meanwhile, since the 6 only genomes with this error I identified so far, all contain a 'match' feature, I suggest to simply add 'match' to line 597 of gff_parser.py
if element_type not in ['chromosome', 'region','match']:
In some gff files are features that cover most of the track. For example : GCF_000247795.1 In the related gff file (enclosed), there is a feature named "match" that fully overlaps with the first chromosome NC_032650.1 RefSeq region 1 161108492 . + . ID=NC_032650.1:1..161108492;Dbxref=taxon:9915;Name=1;breed=Nelore;chromosome=1;country=Brazil;gb-synonym=Bos taurus indicus;gbkey=Src;genome=chromosome;isolate=QUIL7308;mol_type=genomic DNA;note=animal owned by Agropecuaria Quilombo Inc.;sex=male;tissue-type=peripheral blood mononuclear cells line num 37235: NC_032650.1 RefSeq match 1 161108492 . + . ID=aln0;Target=NC_032650.1 1 161108492 +;gap_count=0;num_mismatch=0;pct_coverage=100;pct_identity_gap=100
In consequence orfget is not able to define any pure intergenic ORF :
NC_032650.1
ORF type Quantity Average length (aa)
c_CDS 7649 100.45
nc_ovp_opp-CDS 19987 58.68
nc_ovp_opp-cDNA_match 201 39.65
nc_ovp_opp-match 1983772 46.8
nc_ovp_same-CDS 11740 52.03
nc_ovp_same-cDNA_match 713 39.64
nc_ovp_same-lnc_RNA 15831 42.05
nc_ovp_same-mRNA 439133 44.33
nc_ovp_same-match 2449854 46.35
nc_ovp_same-pseudogene 10750 48.33
nc_ovp_same-tRNA 16 68.0
nc_ovp_same-transcript 281 65.47
Would it be possible as a preliminary step in orftrack, to exclude features whose region coverage exceeds lets say 90% to avoid this behavior ?
Meanwhile, since the 6 only genomes with this error I identified so far, all contain a 'match' feature, I suggest to simply add 'match' to line 597 of gff_parser.py if element_type not in ['chromosome', 'region','match']: