Closed rasmus-storjohann-PG closed 11 months ago
Hello @rasmus-storjohann-PG - thanks for picking up on this issue.
You are right: the terminology in HSDS is not terribly consistent with wider usage. This is something that was inherited from earlier versions, but I'm marking this issue for consideration as part of next 2.0 cycle of updates (although we might be able to clean up some of the descriptions before that).
For the time being, hopefully the mapping below helps make clear the intended semantics of the current fields.
Changes we might consider for the next upgrade would be:
Table: service_taxonomy
-> service_taxonomy_terms
Field: service_taxonomy.taxonomy_id
-> service_taxonomy_term.term_id
Field: service_taxonomy.taxonomy_detail
-> service_taxonomy_term.term_detail
Table:taxonomy
-> taxonomy_terms
Field taxonomy.vocabulary
-> taxonomy_terms.taxonomy
It's not easy to make those changes in a backwards compatible way, hence tagging this for 2.0.
Other issues
I've suggested above that we don't have a separate taxonomy table, and foreign key - but would instead be aiming to publishers to converge on a codelist of 'taxonomy' names to aid interoperability.
The number of top terms (terms without parents) could be used to identify the different taxonomies present in a system.
The redundancy of parent_name
is a good question. Often data exchange standards will have some redundancy present, to recognise that users are often working with incomplete sets (e.g. if you get back the classification of a term from an API you might not get the full taxonomy tree, and having easy access to parent name can be useful). But - I'm not sure that justifies the inclusion here - so would welcome views on whether this can be safely dropped from next version.
There are some wider issues open about taxonomy terms and linked terms that might need to also be addressed here.
First I'd like to concur that:
Hence the name changes proposed by @timgdavies make sense.
Regarding using terms without parents as the identifiers for vocabularies/taxonomies, this implies, I think, giving what SKOS defines as top-level terms parent ids of what SKOS defines as concept schemes. It would work but it combines two types of entity (a concept scheme and a concept) in one field.
At present in our early implementation, we're just selecting distinct vocabulary references to get a unique list of vocabularies, but we have no links for those vocabularies. Hence a resolution using @timgdavies's approach or a separate table of vocabularies would really help
I like the precision these changes introduce, in general.
I do agree that retaining the parent_name and maybe adding parent_id is useful. The current spec calls for including the entire taxonomy system in any export so that references to its terms can be "looked up" by the receiving system. But this introduces two problems. 1) not all taxonomy systems are freely licensed to be distributed and 2) if you are only sending a small dataset (say, a single record) then the size of the reference taxonomy could be far larger than the actual payload of the record itself.
I am closing this as I believe the we have introduced these changes already. In 3.0 at least, there are separate taxonomy
and taxonomy_term
schemas with an appropriate relationship. vocabulary
is no longer present, and is replaced by other fields which provide either a relationship to the taxonomy
object, or a free-text description of the existing taxonomy.
As far as I can tell, in common usage, the AIRS is one taxonomy (much less frequently called a vocabulary) that contains a large number of entities that I believe are usually called taxonomy terms. In AIRS, each such term has an id (sometimes called code?) and a name. I'm new to this field, so I may not have gotten all of that right.
This naming doesn't align well with the naming used in the standard. The name of the
taxonomy
table implies that it contains one entry for each taxonomy, when it actually contains one entry for each taxonomy term. Thevocabulary
column identifies the taxonomy, so it would be much more consistent with common usage if this was calledtaxonomy_id
, anticipating that this will become a foreign key once there is a table of data about the different taxonomies in use.The description of the
id
column is unclear, what use cases are satisfied by prefixing the id that cannot be handled by thetaxonomy_id
(i.e.vocabulary
) field?The
parent_name
field seems unnecessary, since it states that theid
field is unique, soparent_id
should be sufficient. However, if the id prefixing thing is removed, it is possible that the id would no longer be unique. However in that case, the (id
,taxonomy_id
) together would be unique. It seems to me that we can reasonably assume that no taxonomy term is the child of a term from a different taxonomy, so the parent of any term can be looked up using the (parent_id
,taxonomy_id
) of the child.