jorvis / Attributor

Generate gene annotation from a wide variety of evidence sources
Apache License 2.0
2 stars 1 forks source link

Assigning different transcripts to the same gene #4

Closed christopher-holt closed 2 years ago

christopher-holt commented 2 years ago

When I generate the attributor gff3/fasta file, genes that have multiple transcripts are separated into separate genes (eg g10005 and g10005_2). Is there a way to have attributor assign the transcripts to the original parent gene or would I have to edit the attributor.gff3 file manually?

augustus.hints.aa.zip augustus.hints.gff3.zip

jorvis commented 2 years ago

Just letting you know I'm out this week but will look into this more when I get back on Monday.

christopher-holt commented 2 years ago

Hi Mr Orvis, I was just following up and seeing if there is any other information I can provide?

jorvis commented 2 years ago

Christopher - Not at the moment. I'm afraid right now I'm just taking priorities in turn and hoping to get to this soon.

jorvis commented 2 years ago

Christopher - Can you please attach the genome here or e-mail me another link? I want to test my updates on that one and the Sharepoint link you sent before doesn't seem to be valid now.

jorvis commented 2 years ago

If you're up for a Zoom call soon it might also be helpful just to get your command and the source evidence files you are using. This was never built/tested on euk files so I'm wanting to track it the same way you have.

christopher-holt commented 2 years ago

Hi Mr Orvis, I have sent you an email with a link to the B pahangi genome. I am available all day tomorrow and any day next week for a zoom call. I will also be on campus if that would be easier?

jorvis commented 2 years ago

Sure, do you happen to be free at 10am your time? And I live in another state, so I'm afraid I can't meet on campus!

christopher-holt commented 2 years ago

10.30am my time would work better for me if possible! I can email you the zoom details!

jorvis commented 2 years ago

See you then

jorvis commented 2 years ago

Christopher -

I have modified the underlying biocode library to no longer split genes unless specifically requested. Please let me know how you'd like me to send you the file, as I didn't want to attach it here in the public forum unless you were comfortable with that. Here is an example with the demo gene you gave:

## source GFF
tig00087759_pilon   AUGUSTUS    gene    25827   27697   0.78    -   .   ID=g10005;
tig00087759_pilon   AUGUSTUS    mRNA    25827   27445   0.43    -   .   ID=g10005.t1;Parent=g10005;
tig00087759_pilon   AUGUSTUS    stop_codon  25827   25829   .   -   0   ID=g10005.t1.stop1;Parent=g10005.t1;
tig00087759_pilon   AUGUSTUS    CDS 25827   26495   0.47    -   0   ID=g10005.t1.CDS1;Parent=g10005.t1;
tig00087759_pilon   AUGUSTUS    exon    25827   26495   .   -   .   ID=g10005.t1.exon1;Parent=g10005.t1;
tig00087759_pilon   AUGUSTUS    intron  26496   26563   0.47    -   .   ID=g10005.t1.intron1;Parent=g10005.t1;
tig00087759_pilon   AUGUSTUS    CDS 26564   27445   0.86    -   0   ID=g10005.t1.CDS2;Parent=g10005.t1;
tig00087759_pilon   AUGUSTUS    exon    26564   27445   .   -   .   ID=g10005.t1.exon2;Parent=g10005.t1;
tig00087759_pilon   AUGUSTUS    start_codon 27443   27445   .   -   0   ID=g10005.t1.start1;Parent=g10005.t1;
tig00087759_pilon   AUGUSTUS    mRNA    25827   27697   0.35    -   .   ID=g10005.t2;Parent=g10005;
tig00087759_pilon   AUGUSTUS    stop_codon  25827   25829   .   -   0   ID=g10005.t2.stop1;Parent=g10005.t2;
tig00087759_pilon   AUGUSTUS    CDS 25827   26495   0.46    -   0   ID=g10005.t2.CDS1;Parent=g10005.t2;
tig00087759_pilon   AUGUSTUS    exon    25827   26495   .   -   .   ID=g10005.t2.exon1;Parent=g10005.t2;
tig00087759_pilon   AUGUSTUS    intron  26496   26563   0.44    -   .   ID=g10005.t2.intron1;Parent=g10005.t2;
tig00087759_pilon   AUGUSTUS    CDS 26564   27697   0.69    -   0   ID=g10005.t2.CDS2;Parent=g10005.t2;
tig00087759_pilon   AUGUSTUS    exon    26564   27697   .   -   .   ID=g10005.t2.exon2;Parent=g10005.t2;
tig00087759_pilon   AUGUSTUS    start_codon 27695   27697   .   -   0   ID=g10005.t2.start1;Parent=g10005.t2;

# pre-edit GFF
tig00087759_pilon       .       gene    25827   27697   .       -       .       ID=g10005_2
tig00087759_pilon       .       mRNA    25827   27697   .       -       .       ID=g10005.t2;Parent=g10005_2;product_name=hypothetical protein
tig00087759_pilon       .       CDS     25827   26495   .       -       0       ID=g10005.t2.CDS1;Parent=g10005.t2
tig00087759_pilon       .       CDS     26564   27697   .       -       0       ID=g10005.t2.CDS2;Parent=g10005.t2
tig00087759_pilon       .       exon    25827   26495   .       -       .       ID=g10005.t2.exon1;Parent=g10005.t2
tig00087759_pilon       .       exon    26564   27697   .       -       .       ID=g10005.t2.exon2;Parent=g10005.t2

tig00087759_pilon       .       gene    25827   27697   .       -       .       ID=g10005
tig00087759_pilon       .       mRNA    25827   27445   .       -       .       ID=g10005.t1;Parent=g10005;product_name=hypothetical protein
tig00087759_pilon       .       CDS     25827   26495   .       -       0       ID=g10005.t1.CDS1;Parent=g10005.t1
tig00087759_pilon       .       CDS     26564   27445   .       -       0       ID=g10005.t1.CDS2;Parent=g10005.t1
tig00087759_pilon       .       exon    25827   26495   .       -       .       ID=g10005.t1.exon1;Parent=g10005.t1
tig00087759_pilon       .       exon    26564   27445   .       -       .       ID=g10005.t1.exon2;Parent=g10005.t1

# post-update GFF
tig00087759_pilon       .       gene    25827   27697   .       -       .       ID=g10005
tig00087759_pilon       .       mRNA    25827   27445   .       -       .       ID=g10005.t1;Parent=g10005;product_name=hypothetical protein
tig00087759_pilon       .       CDS     25827   26495   .       -       0       ID=g10005.t1.CDS1;Parent=g10005.t1
tig00087759_pilon       .       CDS     26564   27445   .       -       0       ID=g10005.t1.CDS2;Parent=g10005.t1
tig00087759_pilon       .       exon    25827   26495   .       -       .       ID=g10005.t1.exon1;Parent=g10005.t1
tig00087759_pilon       .       exon    26564   27445   .       -       .       ID=g10005.t1.exon2;Parent=g10005.t1
tig00087759_pilon       .       mRNA    25827   27697   .       -       .       ID=g10005.t2;Parent=g10005;product_name=hypothetical protein
tig00087759_pilon       .       CDS     25827   26495   .       -       0       ID=g10005.t2.CDS1;Parent=g10005.t2
tig00087759_pilon       .       CDS     26564   27697   .       -       0       ID=g10005.t2.CDS2;Parent=g10005.t2
tig00087759_pilon       .       exon    25827   26495   .       -       .       ID=g10005.t2.exon1;Parent=g10005.t2
tig00087759_pilon       .       exon    26564   27697   .       -       .       ID=g10005.t2.exon2;Parent=g10005.t2
christopher-holt commented 2 years ago

Mr Orvis, Would it be possible to email me the file?

jorvis commented 2 years ago

E-mail sent, closing for now.