legumeinfo / mine-issues

Report ALL issues on LIS mines here! Regardless of which mine you found it on!
2 stars 0 forks source link

Orphaned ontology terms #138

Closed sammyjava closed 9 months ago

sammyjava commented 9 months ago

Some ontology loaders are loading terms without an associated ontology. Fix this.

legumemine-5103=> select split_part(identifier, ':', 1),count(*) from ontologyterm where ontologyid is null group by split_part order by split_part;
  split_part   | count 
---------------+-------
 CDD           |  2027
 Coils         |     1
 EC            |   109
 FBbt          |     1
 FMA           |     5
 GO            |   172
 Gene3D        |  1510
 HAMAP         |   588
 JCVI_TIGRFAMS |  1062
 LIS           |  1519
 MIRBASE       |   522
 MIRMED        |   224
 MtGEA         | 13151
 NCBI_GP       | 79854
 PANTHER       |  4458
 PFAM          |  4124
 PIR           |   630
 PMID          |   363
 PRINTS        |   570
 Pfam          |    36
 Prosite       |  1498
 SMART         |   596
 Superfamily   |  1195
 TIGRFAM       |   753
 UniProt       | 36715
 WBbt          |     2
 locus         | 66871
 protein       | 37031
(28 rows)
sammyjava commented 9 months ago

Pretty simple, they are almost all terms for ontologies we don't support, loaded from the annotation GFFs. The one exception was Hwangkeum, which had PFAM: instead of Pfam:, which I fixed in the GFF.

I've added removal of orphaned ontology terms to lis-remove-orphans.