Closed kimrutherford closed 2 years ago
fixed except human CETN2 ortholog
looking into this anyway becasue chromosome1.contig:FT CETN2 ortholog; date=20170902" chromosome3.contig:FT /controlled_curation="term=human CETN1 and CETN2 and CETN3
Could you re run this query so that I can heck all are fixed. The ticket can stay as low priority for now.
This is from the 2019-01-23 nightly load:
name | count
----------------------------------------+-------
conserved in archaea | 300
conserved in bacteria | 1121
conserved in eukaryotes | 4546
conserved in eukaryotes only | 2588
conserved in fungi | 4640
conserved in fungi only | 547
conserved in metazoa | 3554
conserved in vertebrates | 3533
faster evolving duplicate | 22
metazoa | 1
no apparent S. cerevisiae ortholog | 604
orthologs cannot be distinguished | 109
predominantly single copy (one to one) | 3117
Schizosaccharomyces pombe specific | 152
Schizosaccharomyces specific | 224
vertebrates | 1
(16 rows)
I fixed the odd ones, will check again in a few months if not implemented
Could you re-run this query for me so I can check that no errors crept in...
Could you re-run this query for me so I can check that no errors crept in...
Looking good:
name | count
----------------------------------------+-------
conserved in archaea | 300
conserved in bacteria | 1119
conserved in eukaryotes | 4545
conserved in eukaryotes only | 2590
conserved in fungi | 4639
conserved in fungi only | 545
conserved in metazoa | 3557
conserved in vertebrates | 3537
faster evolving duplicate | 22
no apparent S. cerevisiae ortholog | 610
orthologs cannot be distinguished | 104
predominantly single copy (one to one) | 3118
Schizosaccharomyces pombe specific | 152
Schizosaccharomyces specific | 224
(14 rows)
Probably because I hardly changed anything....
ok, this can remain on back-burner
@kimrutherford could you rerun this for me to see if I need to do any fixes?
name | count
----------------------------------------+-------
conserved in archaea | 299
conserved in bacteria | 1119
conserved in eukaryotes | 4544
conserved in eukaryotes only | 2592
conserved in fungi | 4640
conserved in fungi only | 542
conserved in metazoa | 3560
conserved in vertebrates | 3540
faster evolving duplicate | 22
no apparent S. cerevisiae ortholog | 608
orthologs cannot be distinguished | 104
predominantly single copy (one to one) | 3122
Schizosaccharomyces pombe specific | 153
Schizosaccharomyces specific | 224
(14 rows)
Note to self:
SELECT t.name,
count(fc.feature_cvterm_id)
FROM cvterm t
JOIN feature_cvterm fc ON fc.cvterm_id = t.cvterm_id
JOIN cv ON t.cv_id = cv.cv_id
WHERE cv.name = 'species_dist'
GROUP BY t.name ORDER BY t.name;
great, I'll put this to future. I should usually spot anomolies and these hardly change.
I've added a check for this in the logs. Look out for a log file ending in .species_dist_term_name_typos
.
Typos in the species distribution CV annotations need to be logged.
We'll need to configure a list of the valid term names. Here's a list of the current number of terms each term name is used in an annotation: