photocyte / PPYR_OGS

Official Geneset (OGS) for Photinus pyralis, at least until I figure out a long term alternative
MIT License
0 stars 2 forks source link

Input UTRs for genes that have it #18

Open photocyte opened 5 years ago

photocyte commented 5 years ago

The GFF sort and regenerate script is now fixed so that mRNAs should be properly produced. But, the vast vast majority of our gene models are CDS only. In other words, the mRNA equals the CDS and there are no UTRs. This is due to the fact that EVM only produces gene models with CDSs.

But, we have the data to manually input the UTRs in the DCGMs. So, I will use the following criteria to select those gene models to manually change:

  1. Use the Trinity gene models if possible (just seems better most of the time)
  2. 5' UTR extends off first exon (this seems to the be normal case, but there are exceptions like PYR_09240-PA)
  3. 3' UTR ends with a Trinity/PASA predicted polyadenylation site (e.g. Luc1)

Edit: This seems like quite a promising approach https://github.com/PASApipeline/PASApipeline/wiki/PASA_genome_annotation

photocyte commented 5 years ago

Swapping out luciferase gene for:

Type: mRNA ID: Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_4802.p1 Parent: GENE.Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_4802~~Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_4802.p1 Name: ORF

Edit: finished in d4c2564a195e6174beacc6bcce826540c8f8f389

photocyte commented 5 years ago

Swapping out Luc2 for: mRNA: Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_61608.p1 (Doesn't have a PolyA site annotated, but seems to be a quite reasonable improvement)

Edit: finished in b23c01014f2753da82307e32041097fdbf07e657

photocyte commented 5 years ago

Swapping out PPYR_04899-PA for:

mRNA: Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_20717.p1 Has legit PolyA, and 5' UTR seems reasonable

Edit: finished in ac7c65e6e5697e2a0e842127296c4e3393881902

photocyte commented 5 years ago

Swapping out PPYR_11147-PA for:

mRNA: Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_50264.p1 Has no PolyA annotated, but quite reasonable UTRs

photocyte commented 5 years ago

Swapping out PPYR_09320-PA for:

Type: mRNA ID: Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_41500.p1 No polyA annotated, but quite reasonable UTRs

photocyte commented 5 years ago

Swapping out PPYR_06194 for:

mRNA: Ppyr1.3_Trinity-PASA_stranded-DCGM_transdecoder_asmbl_26853.p1 No PolyA annotated, but quite reasonable UTRs