Remove links between gene builds and chromosomes/scaffolds

adf-ncgr commented 7 years ago

There are analysisfeature records linking chromosomes and scaffolds to gene builds. Since gene builds don't define chromosomes, these should be removed from Chado.

[LEGUME-728] created by ecannon

adf-ncgr commented 7 years ago

note that this may actually be a task for peanutbase not legumeinfo, though it is probably worth verifying that all species are similar to what I described for vigun and lupan (see GH-727 Done ), since these two are among the most recently loaded species, and my very dim memory of cleaning up everything else while I was at it may have been a case of good intentions unfulfilled in reality...

by adf_ncgr

adf-ncgr commented 7 years ago

Looks like a task for both PeanutBase and LegumeInfo, and that it should perhaps be expanded a bit. If one objective is to remove unnecessary analysisfeature records, then gene build analysisfeature records should be removed from features of type exon, mRNA, and polypeptide. For example:

by ecannon

adf-ncgr commented 7 years ago

I'm not sure that those should be removed. One objective in having the links is to enable easy identification of the features added from a particular analysis- this would facilitate easy deletion of featuresets deemed no longer of interest, for example; also, probably some overview reports of content along the lines of the things that Connor has been working on. I guess it depends on what we consider necessary, but I am viewing it as being more than support of the gene page.

by adf_ncgr

adf-ncgr commented 7 years ago

I'm okay with leaving the exon, mRNA, and polypetide links, but note that they are inconsistent:

drupal=> select count, a.name, t.name from feature f, analysisfeature af, analysis a, cvterm t where f.type_id=t.cvterm_id and f.feature_id=af.feature_id and af.analysis_id=a.analysis_id and a.name like 'phavu.%' group by a.name, t.name;
count | name | name
----------------------------------------
11 | phavu.G19833.gnm1 | chromosome
697 | phavu.G19833.gnm1 | supercontig
27197 | phavu.G19833.gnm1.ann1 | gene

drupal=> select count, a.name, t.name from feature f, analysisfeature af, analysis a, cvterm t where f.type_id=t.cvterm_id and f.feature_id=af.feature_id and af.analysis_id=a.analysis_id and a.name like 'glyma.%' group by a.name, t.name;
count | name | name
--------------------------------------
20 | glyma.Wm82.gnm2 | chromosome
1170 | glyma.Wm82.gnm2 | supercontig
56044 | glyma.Wm82.gnm2.ann1 | gene

by ecannon

adf-ncgr commented 7 years ago

good point; I suspect that is due to differences in how the earlier genomes were loaded- the newer
ones are using the loader itself to get these analysisfeature linkages established, I think most of the
older ones got them added after the fact through a manual process. I'll take on the task of consistification for legumeinfo unless you want to...

by adf_ncgr

adf-ncgr commented 7 years ago

It's all yours! I'll tackle peanutbase.

by ecannon

legumeinfo / jira-issues

Remove links between gene builds and chromosomes/scaffolds #694