legumeinfo / chickpeamine

A mine containing chickpea (desi) and chickpea (kabuli) genomes.
GNU Lesser General Public License v3.0
0 stars 0 forks source link

GFF children always reference "gene" rather than correct parent class #6

Closed sammyjava closed 4 years ago

sammyjava commented 4 years ago

Placed in ChickpeaMine because one of those GFFs has exons, which are getting their mRNA parent reference stored as "gene". This is a core IM GFF source update.

sammyjava commented 4 years ago

Nope, this is not a core update. It's in refsAndCollections in LISGFF3RecordHandler.

adf-ncgr commented 4 years ago

@sammyjava before you put too much heroic effort into this, I think we probably should decide more officially: a) what we want to have consistently from GFFs in our mines b) whether the way to handle this is by making GFFs that conform or writing code that can deal with GFF madness

the GMOD perl chado loader takes an approach that is probably in the direction of b), which is why you never had so much trouble before. We may be getting close to a point where we can say more definitely what a GFF "best practice" might look like so a) may not be as far out of reach as it used to seem.

nevertheless, I'm beginning to lose stomach for talking more about the datastore this week, so such a decision may not be imminent. and of course, every time I think we've made a decision, we haven't...

sammyjava commented 4 years ago

It's not heroic, turns out it's just a couple of lines in the local record handler. And I have to since the GFFs have mRNA parents of exons, so I need to store those as Exon.transcript (already in the model) or Exon.mRNA, not Exon.gene. If the GFFs all change in the future, I'll change the parent.

I'm just making the GFF loader load the parent relationship that is given in the GFF.

sammyjava commented 4 years ago

My general approach is "make the mine mirror what's in the datastore." I will stay away from any discussions of actual GFF content. If there's a GFF, I can load it. Right now I'm just fixing a loader bug.

adf-ncgr commented 4 years ago

sounds like that means if we want consistency in the mines we'll need to impose it on the datastore. fair enough.

sammyjava commented 4 years ago

This is "fixed." Meaning my brain is. I just had this wrong. It also turns out that the GFFConverter chains parents up the line if references exist! No post-processor needed. So with Exon.transcripts set in refsAndCollections (the map that provides parent associations), and MRNA.gene as well, the loader drills up, finds Exon has a gene reference and populates that when MRNA.gene gets populated (because MRNA is a subclass of Transcript). For the record, here are the current refsAndCollections in LISGFF3RecordHandler, which requires no extension of the core model other than the addition of mRNA and the other oddballs.

refsAndCollections.put("Exon", "transcripts");            // SO: exon->transcript_region->transcript
refsAndCollections.put("CDS", "transcript");              // SO: CDS->mRNA_region->transcript
refsAndCollections.put("FivePrimeUTR", "transcripts");    // SO: five_prime_UTR->UTR->mRNA_region
refsAndCollections.put("ThreePrimeUTR", "transcripts");   // SO: three_prime_UTR->UTR->mRNA_region
refsAndCollections.put("MRNA", "gene");                   // SO: mRNA->mature_transcript->transcript->gene_member_region
refsAndCollections.put("RRNAPrimaryTranscript", "gene");  // SO: rRNA->ncRNA->mature_transcript->transcript->gene_member_region
refsAndCollections.put("TRNAPrimaryTranscript", "gene");  // SO: tRNA->ncRNA->mature_transcript->transcript->gene_member_region