hiscom / hispid

HISPID Terms
6 stars 1 forks source link

Can we make HISPID a bit more useful? #132

Open acvaughan opened 7 years ago

acvaughan commented 7 years ago

I hesitate to write this, because I'm a bit late to the party, and I don't want this to come across as disrepectful to the work that everyone has put in so far.

I'm about two-thirds of the way through HISPID 6.0, and am making lots of notes and queries (in a printed copy) about how definitions could be tweaked or how usage notes might be clarified to make them more useful for your average (or perhaps your less-experienced-than-average user). Somewhere in the Media Item class (which I understand is mostly from Audubon Core, and not brilliantly formulated, in my view), I started to wonder whether it's a good idea to be messing around with definitions of terms that are taken from other standards.

I think what we should aim for with HISPID is to provide enough interpretive information and usage notes so that a more naive user (e.g. MAHC members, or people setting up their collections management system and hoping to deliver to AVH) can get past the somewhat intimidating terminology and actually understand what the terms mean in relation to their collections, and gain confidence in using them.

To achieve this, I think that (for non HISPID-specific terms) it's better to leave the definitions as they are in Darwin Core and Audubon Core, and focus on explaining how these terms are implemented in our HISPID world. So HISPID would represent more of a well-considered, experienced use case scenario, rather than a redefinition or terms that exist in other standards.

I really think this would be more useful for audiences like university herbaria and regional herbaria. Otherwise I think we risk producing a document that is still seen as a bit too technical and intimidating for some of our key users.

ben3000 commented 7 years ago

Alison, from my point of view, that is pretty much what I hope to do before we publish the final result. I can't remember to what degree we agreed on this, but it makes a lot of sense. I think it is OK, though, to correct a definition in Darwin Core, if necessary.

To what degree do you feel that we're redefining terms?

I definitely favour everyday language over technical language in the definitions. It is vital that people understand what gets delivered for a term. That said, there needs to be some technical stuff, we're a technical committee after all, and the IT types need to be able to see RDF to do their bit.

acvaughan commented 7 years ago

Hi Ben.

To be honest, I've no idea how much terms have been redefined so far. I guess in my reading through the document I feel very tempted to reword a lot of the definitions, and I had gotten the impression (perhaps erroneously) that the HISPID working group had been modifying definitions already. If they are all pretty much the same as their source standard, then please ignore that part of my suggestion above, but I do stick by the bit about making HISPID as easy as possible for people to understand and implement.

I'm glad that you agree on that point, and that it's in scope for HISPID. I think we should put as much energy into the "use case" aspect of HISPID as possible. I'm not suggesting that any of the use case stuff comes at the expense of the technical side of it, just as an addendum to it. Perhaps we could think of HISPID as a document that could be MAHC and HISCOM could apply to their respective tasks with equal confidence.

p.s. To elaborate on that last point ... I'm currently looking into 19 records in our database that are flagged as both Cultivated and Native. I didn't think this was a problem, but Niels tells me that's impossible and wrong in terms of the standards. But I don't quite get it, and looking at HISPID, I don't get any clarification. I would like to think that I should be able to:

These are examples of MAHC-ish tasks that I think that HISPID should be useful for. But to me, cultivatedPlantProvenance reads as if it's talking about living collections, and I can't find the equivalent of posnat, but I think I've just given up looking. Maybe this is one of the trickier concepts, but I've had a pretty good night's sleep, and ten years on HISCOM, so I think it should be easier for me to figure out.

nielsklazenga commented 7 years ago

Even if that is so, there is absolutely no need for the RDF to include all the Dublin Core, Darwin Core and Audubon Core terms that we have a use for. Neither is there a need for HISPID to include all those terms and trying to write better definitions for every Darwin Core and Audubon Core term is a colossal waste of time. It would be much better to have like a HISPID primer that explains how we use all the terms from the different standards and how HISPID extends those standards. HISPID should only define HISPID terms, and the list of those definitions should not be more than an appendix. Just like all W3C documents are written.

As for the RDF, I guess I am an IT type – I even like RDF – but I don't need RDF for absolutely everything. ALA is full of IT types and they don't use RDF. When it was suggested that the normative document for Darwin Core would be in RDF, it were the people who know most about RDF who protested (for HISPID, I wouldn't want anything normative at all). RDF is great, but it is not for everything.

I enjoyed the RDF exercise, and I think we should continue it, but it shouldn't be confounded with HISPID. It should not really even be part of HISPID, but be considered fluff around it.

For me, the HISPID review was over once we had the list of terms and the vocabularies. After that there are two more things that need to be done, namely implement it, e.g. getting the XML Schema and the CMF files that I have been working on; and explain to people how to use it. The RDF can be part of the implementation, but trying to shoehorn everything into the RDF document is not going to work. It will make the RDF less useful because of all the extra ballast and it will make for poorer documentation, as it is often better to explain terms in combination rather than in isolation, as can be seen from some of the issues Alison reported earlier.

I am still very uncomfortable with the all-encompassing scope that people ascribe to HISPID. I think we should get rid of the idea that Australian herbaria use HISPID and get used to the idea that Australian herbaria use Darwin Core and Audubon Core, just like every other herbarium and museum in the world,, and that all that is left of HISPID is the terms that are still in the hispid namespace and that more widely and internationally used standards do not yet accommodate (basically the herbarium-specific terms). Our purpose should be to provide the highest possible quality standard data, not to have our own standard (which sort of defeats the purpose of having standards). I would like the document Alison wants and I would like to contribute to it, but let's not consider it part of HISPID.

acvaughan commented 7 years ago

Separate document is fine by me. I guess it would just have classes, terms, definitions, and extensive usage notes.

nielsklazenga commented 7 years ago

Alison, that is actually in the document. If you look at establishmentMeans, you see that it replaces posnat and poscul. If you then link to the GBIF vocabulary we use, you'll see that 'cultivated' is a synonym (or 'alternate term' as they call it) of 'managed', which is a subclass of 'introduced', which is disjoint with 'native'. 'disjoint' is just ontological lingo that means that the two terms exclude each other.

This is how linked data in general, and RDF in particular, works. It is meant to be interpretable by a computer; which makes it less suitable for explanation for human consumption.

acvaughan commented 7 years ago

Thanks Niels. I got really confused by the cultivatedPlantProvenance section and forgot about establishmentMeans. It all makes sense now. But, yes, not the easiest presentation of information for human consumption.

nielsklazenga commented 7 years ago

The cultivatedPlantProvenance is for herbarium sheets that are made from cultivated plants, as in PreservedSpecimens based on LivingSpecimens. It will contain the locality details etc. from when and where the cultivated plant was collected in the wild. I remember this was extremely hard to formulate, so there is probably an improvement that can be made.

As for the 'not the easiest presentation of information for human consumption' bit and coming back to your earlier comment, I think that the HISPID terms document should have just the definitions and short usage notes. While I think it is not necessary to have to have all the terms from other standards in the RDF, there is nobody who says you can't (in fact in RDF anyone is supposed to be able to say anything about everything) and, as we have everything already in there, we might as well keep it. But let's not waste a lot of time on trying to refine existing definitions and descriptions, and let's not overload the RDF with things that it is not made for. So the extensive usage notes should be in another document. Separating this from the terms has the advantage that you let go of the restrictions that the RDF – or even the structure of the HTML document – imposes, so you can discuss terms in combination rather than in isolation and can for example discuss all the elevation terms together, which saves a lot of repetition or cross-referencing.

When I say a different document, that is because I focus on the HTML, so I just mean another web page. In the PDF everything can be combined.

Also, I think we need a third document dealing with the disposition of all the old HISPID terms. This has always been the plan and we already got the RDF, but then we sort of forgot about it.

AaronWilton commented 7 years ago

wow! a lot of stuff packed into here... 1) how many terms haven't we tweaked wording of (for better or worse)? (I guess it is a mix, many - most? - just be adopted, but some were 'improved' from memory). Agree with Ben that it may be necessary to refine a definition in a standard for our use (this doesn't mean break it!) as a number of them are slightly vague or allow different implementations and we seem to have been trying to encourage a more precise or consistent use. 2) we need to consider maintainability - I am not convinced that holding things seperately is going to help us maintain consistency. But agree that doing this in RDF may be difficult and so may be forced to make that separation. But I think we need to be clear about a ?master file(s) used to maintain HISPID vs the documents/products that are generated. -> we need to sit down and work and agree on this properly. and 3) that includes the history terms... (which i haven't forgotten but haven't been the highest on the priority list and not 100% sure I like the structure)

I think I agree with Niels - I see HISPID as 1) a "community profile" for how we recommend use of other standards and 2) defining terms to fill the gaps where there are additional concepts we need to pass around that are not covered (yet) in other standards.

this is going to require a focussed discussion i think.