Open adf-ncgr opened 7 years ago
note that this may actually be a task for peanutbase not legumeinfo, though it is probably worth verifying that all species are similar to what I described for vigun and lupan (see GH-727 Done ), since these two are among the most recently loaded species, and my very dim memory of cleaning up everything else while I was at it may have been a case of good intentions unfulfilled in reality...
by adf_ncgr
Looks like a task for both PeanutBase and LegumeInfo, and that it should perhaps be expanded a bit. If one objective is to remove unnecessary analysisfeature records, then gene build analysisfeature records should be removed from features of type exon, mRNA, and polypeptide. For example:
drupal=> select count, a.name, t.name from feature f, analysisfeature af, analysis a, cvterm t where f.type_id=t.cvterm_id and f.feature_id=af.feature_id and af.analysis_id=a.analysis_id and a.name like 'vigan.%' group by a.name, t.name;
count | name | name
--------------------------------------------
11 | vigan.Gyeongwon.gnm3 | chromosome
3376 | vigan.Gyeongwon.gnm3 | supercontig
160281 | vigan.Gyeongwon.gnm3.ann1 | exon
26857 | vigan.Gyeongwon.gnm3.ann1 | gene
36689 | vigan.Gyeongwon.gnm3.ann1 | mRNA
36689 | vigan.Gyeongwon.gnm3.ann1 | polypeptide
by ecannon
I'm not sure that those should be removed. One objective in having the links is to enable easy identification of the features added from a particular analysis- this would facilitate easy deletion of featuresets deemed no longer of interest, for example; also, probably some overview reports of content along the lines of the things that Connor has been working on. I guess it depends on what we consider necessary, but I am viewing it as being more than support of the gene page.
by adf_ncgr
I'm okay with leaving the exon, mRNA, and polypetide links, but note that they are inconsistent:
drupal=> select count, a.name, t.name from feature f, analysisfeature af, analysis a, cvterm t where f.type_id=t.cvterm_id and f.feature_id=af.feature_id and af.analysis_id=a.analysis_id and a.name like 'phavu.%' group by a.name, t.name;
count | name | name
----------------------------------------
11 | phavu.G19833.gnm1 | chromosome
697 | phavu.G19833.gnm1 | supercontig
27197 | phavu.G19833.gnm1.ann1 | gene
drupal=> select count, a.name, t.name from feature f, analysisfeature af, analysis a, cvterm t where f.type_id=t.cvterm_id and f.feature_id=af.feature_id and af.analysis_id=a.analysis_id and a.name like 'glyma.%' group by a.name, t.name;
count | name | name
--------------------------------------
20 | glyma.Wm82.gnm2 | chromosome
1170 | glyma.Wm82.gnm2 | supercontig
56044 | glyma.Wm82.gnm2.ann1 | gene
drupal=> select count, a.name, t.name from feature f, analysisfeature af, analysis a, cvterm t where f.type_id=t.cvterm_id and f.feature_id=af.feature_id and af.analysis_id=a.analysis_id and a.name like 'araip.%' group by a.name, t.name;
count | name | name
----------------------------------------
10 | araip.K30076.gnm1 | chromosome
1183 | araip.K30076.gnm1 | supercontig
10 | araip.K30076.gnm1.ann1 | chromosome
42533 | araip.K30076.gnm1.ann1 | gene
1183 | araip.K30076.gnm1.ann1 | supercontig
drupal=> select count, a.name, t.name from feature f, analysisfeature af, analysis a, cvterm t where f.type_id=t.cvterm_id and f.feature_id=af.feature_id and af.analysis_id=a.analysis_id and a.name like 'lupan.%' group by a.name, t.name;
count | name | name
--------------------------------------------------
114724 | lupan.Tanjil.a1.0.iprscan | protein_hmm_match
151126 | lupan.Tanjil.a1.0.iprscan | protein_match
20 | lupan.Tanjil.gnm1 | chromosome
13554 | lupan.Tanjil.gnm1 | supercontig
182583 | lupan.Tanjil.gnm1.ann1 | exon
33072 | lupan.Tanjil.gnm1.ann1 | gene
33072 | lupan.Tanjil.gnm1.ann1 | mRNA
33072 | lupan.Tanjil.gnm1.ann1 | polypeptide
by ecannon
good point; I suspect that is due to differences in how the earlier genomes were loaded- the newer
ones are using the loader itself to get these analysisfeature linkages established, I think most of the
older ones got them added after the fact through a manual process. I'll take on the task of consistification for legumeinfo unless you want to...
by adf_ncgr
It's all yours! I'll tackle peanutbase.
by ecannon
There are analysisfeature records linking chromosomes and scaffolds to gene builds. Since gene builds don't define chromosomes, these should be removed from Chado.
[LEGUME-728] created by ecannon