statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

proper gff loading plan and protocol (with or without scaffolds) #377

Open mestato opened 6 years ago

mestato commented 6 years ago

Current loading procedure:

Problems with this strategy:

Advantages to this strategy:

The more common Tripal strategy and probably what we want to do in the future is load from gff. Gene -> mRNA -> peptide features only (not introns and exons and UTRs yet). How will display for genes change? Can we make the gene page straightforward enough for users to find isoforms and protein functional homology results? Is now the time to reload our 5 genomes and make everything consistent?

bc edit: see also: #226

bradfordcondon commented 6 years ago

peptide features only (not introns and exons and UTRs yet)

theres not an option for excluding features. I used grep or similar to remove lines describing unwanted feature types. I'll be sure to find an example and link.

Can we make the gene page straightforward enough for users to find isoforms and protein functional homology results?

I think the only way to know for sure is to load it in dev. We can re-load a particular organism and have a look. in theory nothing will change. Annotations will go with the gene instead of the mRNA. Sequences for the specific mRNAs and proteins will still be displayed on the gene page.

bradfordcondon commented 6 years ago

I did in fact write up a guide on how to load the GFF file. It includes handy awk commands for removing feature types you dont want in your db.

https://github.com/mestato/statonlabprivate/wiki/Adding-genomic-locations-to-features-with-the-GFF-loader-in-Tripal-3

almasaeed2010 commented 5 years ago

Linking this to #428