statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

master entity switch from mrna to gene #226

Open bradfordcondon opened 6 years ago

bradfordcondon commented 6 years ago

as discussion in #210

Genes as master entity and mRNAs as splice variants

Unpublishing mRNA, publishing gene, and migrating fields that were hard-coded for gene to mRNA will take care of it.

bradfordcondon commented 6 years ago

A possible wrinkle to this: I thinkwe associate annotations with mRNA, not with genes (or polypeptides for that matter). Would we have to transfer the annotations in feature_cvterm? Or should I instead propose that core look in feature_relationships for features with annotations for that field? Or is this already taken care off and i dont need to worry? (I dont think so, since polypeptide annotations dont show up unless associated with mRNA).

bradfordcondon commented 6 years ago

after talking with stephen, fields should generally only display information for hte entity its attached to. "leap frogging" via relaitonship (protein annotations on an mrna page) requires a specific field to display that information, a la transcripts.

This means that doing this transfer would involve a) being sure to re-map all featureprop, feature_cvterm, and blast_hit_data .

That makes this a much less enticing task.

bradfordcondon commented 6 years ago

Revisiting this. In the context of chinese chestnut, which has alternative isoforms and we therefore are interested in having gene but not mrna entities.

Here's the problem. If we published gene entities and wanted to not also publish mRNA entities, thast a problem, because the next time we publish mRNA entities, it'll autopublish the chestnut ones.

I don't know how we would proceed: either by autodeleting the chestnut mrna entities (crazy town), changing their type to something (which is probably going to cause all sorts of issues, ie mess up the transcript field)..

Assuming do we do go ahead:

So we either a) write a custom migration to transfer annotations from the mrna to the gene, or b) reload everything.

Why? Because fields are only assigned to display information linked to that record, with a few exceptions. So BLAST annotations, for example, would need to be linked to gene (instead of mrna) in order to show up.

b) is much easier but requires finding the data. I'll look and see wahts available easily.

data

still needed:

almasaeed2010 commented 5 years ago

because the next time we publish mRNA entities, it'll autopublish the chestnut ones.

We normally select/filter by an organism and never publish all mRNAs. Does this make things easier? We can alter the form and make sure organism is required too if we wanted ease of mind.

almasaeed2010 commented 5 years ago

Ok I see the rest of the issues are way too big to address

bradfordcondon commented 5 years ago

using this issue for demonstrating gene fields on HWG dev. but first, an update (see https://github.com/tripal/tripal/issues/732 for full issue)

In general, info is organized by mRNA/transcript (not hardcoded: whatever the first degree child is). in collapsible fieldsets. For HWG, you probably still want two entity sets, one ofr gene and one for mRNA.

Heres the current fields on my dev site

screen shot 2019-01-22 at 10 18 23 am

As you can see we're on a gene page, but the mRNA properties, as well as the protein (child of the mRNA) are listed.

Next we try adding this to HWG for chestnut to ensure its feasible on a live site.

https://hardwoods.ag.utk.edu/admin/structure/bio_data/manage/bio_data_26/display

(not a proposal for live. right now the props and annotations fields are jumbled together so i'd want to change that and think about actual structure . before getting too hung up about it).

bradfordcondon commented 5 years ago

Here are some examples on dev

I only found annotations on the mRNA, nothing with annotations on hte protein (which again makes sense since we load annotations onto the mRNA. they would only exist if they were in the GFF file).

Issues

the "map" field which right now lists all subfeatures: it doesnt display coordinates for the polypeptide:

screen shot 2019-01-22 at 10 57 14 am

I would believe its because it doesnt have coordinates, lest check in featureloc

In general, i think the core problems are due to the loading pipeline (pre-trimming the GFF, loading information with the mRNA always instead of say the protein...).