proper gff loading plan and protocol (with or without scaffolds)

mestato commented 6 years ago

Current loading procedure:

load mRNA from fasta
load peptide from fasta, link to mRNA via regexp

Problems with this strategy:

we don't know locations of genes on scaffolds
we can't properly associate alternative splicing isoforms with a single gene
we don't actually have scaffolds in chado for access

Advantages to this strategy:

we don't have "onion" problem where users have to drill down 4 features deep to get to results. Like IPS that are run on proteins
simple

The more common Tripal strategy and probably what we want to do in the future is load from gff. Gene -> mRNA -> peptide features only (not introns and exons and UTRs yet). How will display for genes change? Can we make the gene page straightforward enough for users to find isoforms and protein functional homology results? Is now the time to reload our 5 genomes and make everything consistent?

bc edit: see also: #226

bradfordcondon commented 6 years ago

peptide features only (not introns and exons and UTRs yet)

theres not an option for excluding features. I used grep or similar to remove lines describing unwanted feature types. I'll be sure to find an example and link.

Can we make the gene page straightforward enough for users to find isoforms and protein functional homology results?

I think the only way to know for sure is to load it in dev. We can re-load a particular organism and have a look. in theory nothing will change. Annotations will go with the gene instead of the mRNA. Sequences for the specific mRNAs and proteins will still be displayed on the gene page.

bradfordcondon commented 6 years ago

I did in fact write up a guide on how to load the GFF file. It includes handy awk commands for removing feature types you dont want in your db.

https://github.com/mestato/statonlabprivate/wiki/Adding-genomic-locations-to-features-with-the-GFF-loader-in-Tripal-3

almasaeed2010 commented 5 years ago

Linking this to #428

statonlab / hardwoods_site

proper gff loading plan and protocol (with or without scaffolds) #377