tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

Other Deliverable - preparations review #7

Closed Jegelewicz closed 9 months ago

Jegelewicz commented 3 years ago

Task Group will make a recommendation [...] as to which class in the Darwin Core standard these properties belong which may also include recommendations for terms being revised, added, disambiguated, or deprecated. Depends upon definitions provided [in primary deliverable].

Current Darwin Core Placement/Definition

http://rs.tdwg.org/dwc/terms/preparations

this term is a property of Occurrence

Defintion

A list (concatenated and separated) of preparations and preservation methods for a specimen.

Examples

fossil, cast, photograph, DNA extract, skin | skull | skeleton, whole animal (ETOH) | tissue (EDTA)

Comments

Recommended best practice is to separate the values in a list with space vertical bar space ( | ).

See also

Change term - preparations: https://github.com/tdwg/dwc/issues/346

smrgeoinfo commented 1 year ago

This only applies to material sample. Material entity is much broader.

Jegelewicz commented 1 year ago

This only applies to material sample.

I'm not sure that I agree. It seems like things can be prepared without being A physical result of a sampling (or subsampling) event. A lot of stuff in collections was not "sampled" in that way.

Again, I think that preparations could actually apply to more than one class. In my opinion, all of these could have preparations

MaterialEntity MaterialSample PreservedSpecimen FossilSpecimen LivingSpecimen (do cuttings grown into new things or clones count as preparation?) MachineObservation (.jpg, printed photograph, tape recording, etc.)

I would actually recommend not using the term specimen, but that's for after we decide if MaterialEntity could use this property or not. I would also prefer to see this not be plural and instead add as many preparation properties to whatever is "prepared" as necessary.

tucotuco commented 1 year ago

More unattributed dirty laundry...

The top 20 values for preparations in HUMAN_OBSERVATION records in the July 2022 snapshot of GBIF, in descending order of frequency, were: Frozen 95% ethanol NA photograph Data NO APLICA Fresh Colecta definitiva: Ejemplar completo en Etanol 90% ethanol ETOH media NO DISPONIBLE Animal completo (ETOH) RNA stabilization buffer Animal entero en sobre Fotografía Alfiler Lugol preserved samples Técnicas estándar de muestras botánicas para un herbario DNA extract

The top 20 values for preparations in MACHINE_OBSERVATION records in the July 2022 snapshot of GBIF, in descending order of frequency, were: Fotografía digitised .wav Photograph Corte de audio Polyester photograph fotografía Film SEM micrograph Acetate Photograph:b&w Tarjeta SD photograph-digital - 1 Google Drive pen and ink Image only; Biorepository Ink Microslide digitised .aif default - 1

Jegelewicz commented 1 year ago

preparations and preservation methods whole animal (ETOH)

it seems like there should be two terms?

preparation - what you have - whole animal preservation - how it is preserved - ETOH

Should we take the time now to split these up?

dagendresen commented 1 year ago

I think yes, that we would want to move the term(s) to MaterialEntity, and split into two terms preparation and preservation.

Jegelewicz commented 1 year ago

Jutta - Organize in MaterialEntity, but note that this needs to be clarified and all of the dimensions related to Material need to be detailed and classified. But this is a task for another group.

Jegelewicz commented 1 year ago

@cboelling - meant and used to describe material in the context of an occurrence, but it is vague and the definition is more of an instruction than a definition. Needs to be divided and given proper definitions for actual material properties - a task for another group. Deprecate and get a better set of terms.

Jegelewicz commented 1 year ago

@tucotuco - Dividing occurrence from the stuff that came from it and for now we will have a term that is overloaded, but we need someone else to work on that.

Jegelewicz commented 1 year ago

Andy Bentley tried to do this separation (Arctos has some of this too). But nothing has ever been formalized. Again, a project for another day!

stanblum commented 1 year ago

Everyone on the call today (Teresa, Stan, Jutta, Christian, John) agreed (no objections) that dwc:preparations should be organized under MaterialEntity rather than Occurrence, and that developing a materialEntity extension -- to rigorously address the things (parts), treatment (preparations), and storage regime -- would be a timely (overdue) project, but it's out of scope for the MaterialSample project. In the short-term, some clarification of the syntax and examples would be useful.

deepreef commented 1 year ago

Many thanks for sharing this! I apologize for not being able to attend the last couple meetings (I was/am travelling during both).

One thing I may have missed, but am hoping to get clarity on: Does the term MaterialEntity replace MaterialSample? Or will they both be maintained as related by distinct terms (e.g., class/subclass; whatever)? The practical implication of this has to do with the LARGE number of materialSampleID values I have already generated. Can I present these as materialEntityID values going forward? Or do I need to mint another set of ID values to present as materialEntityID, while maintaining my existing values as materialSampleID?

I know what the purist answer is; but keep in mind that when I assign identifiers to things, those identifiers are explicitly intended to represent the actual "thing" (physical or abstract/conceptual); not the "data record for the thing".

P.S. I understand that this is not the right issue to ask this question in (preparations), but as I said, I am travelling, with tenuous internet access, and it was easy to reply to Stan's post that showed up in my email inbox, and I don't really have the time to hunt down the correct issue to post this to...

Jegelewicz commented 1 year ago

Does the term MaterialEntity replace MaterialSample?

No - see https://github.com/tdwg/material-sample/blob/main/review%20package/MaterialEntity.md

I don't feel that I can confidently answer your other questions, but @tucotuco said he will weigh in here.

deepreef commented 1 year ago

Excellent! Thank you, @Jegelewicz ! This is extremely helpful. I knew this New Term Request existed, but haven't had he time to read it, and I should have done that before posting here, above. This New Term Request is extremely well written, and answers most of my questions! I will articulate my question about IDs over there.

Again, thansk for the redirection, and apologies for cluttering this thread.

*Edit: Where would I post commentary about this New Term Request - specifically about how to handle extsing materialSampleID values?

cboelling commented 1 year ago

@deepreef:

Can I present these as materialEntityID values going forward? Or do I need to mint another set of ID values to present as materialEntityID, while maintaining my existing values as materialSampleID?

That individual xyzID properties are present in DwC has puzzled me for some time. I think it might, as other features of DwC, the consequence of the design - the serialization, actually - of simple DwC archives (but maybe I miss an important aspect concerning the expressivity). I think that any value that has been minted and used as value in a statement including dwc:materialSampleID can equally well be used with dwc:materialEntityID because, even if it is not presented in DwC, dwc:MaterialSample is a subclass of dwc:MaterialEntity. An instance of dwc:MaterialSample is also an instance of dwc:MaterialEntity (but the opposite isn't true).

tucotuco commented 1 year ago

Christian was able to get to this before I could. I agree entirely that a materialSampleID already represents a materialEntityID by virtue of MaterialSample being a narrower term (a subtype) for a MaterialEntity.

deepreef commented 1 year ago

Thank you @cboelling and @tucotuco! That is what I was hoping (and expecting, given the subclass assertion).

However, I'm a little confused by the statement:

That individual xyzID properties are present in DwC has puzzled me for some time.

I am torn about examples of xyzID terms within class Pdq -- I see value for them functioning as direct "properties" of instances of class Pdq as a sort of "foreign key" link to instances in different classes; however I also see how these kinds of properties might best be shared via separate instances of ResourceRelationship. I could go either way on that.

However, without terms like xyzID within class Xyz, how would we provide unique identifiers ("primary key") for instances of class Xyz? That seems like a pretty fundamental property for all DwC classes. Or do I umisunderstand your point?

Jegelewicz commented 1 year ago

Perhaps there should be an identifier class in Darwin Core as the Latimer Core people have pioneered?

https://github.com/tdwg/rs.tdwg.org/blob/latimer/process/page_build_scripts/index.md#ltc_Identifier

image

We tried to discuss this at the meeting in March - but it was deemed out of scope (and it is) - but this is something that goes right along with the question of what are we doing? Are we creating a bunch of terms or a structure in which they can/should be used?

baskaufs commented 1 year ago

For background on the xID terms, I'd recommend taking a look at section 2.6 of the Darwin Core RDF Guide. In flat tables, the ID terms are used both as a primary key for the entities represented by the row in the table and also as a sort of foreign key linking to related resources that may be described in other tables. This design pattern is useful in this context, but has certain problems when people want to say that the ID terms are always "about" the subject resource. That's why they aren't recommended for use in RDF where their semantics are unclear. In RDF, it makes the most sense to use dcterms:identifier to state the identifier of the subject resource and dwciri: terms to link out to related resources. The semantics of those terms aren't ambiguous. This is actually the normative guidance within Darwin Core and departing from it would require a coordinated change to the standard.

Jegelewicz commented 1 year ago

In RDF, it makes the most sense to use dcterms:identifier to state the identifier of the subject resource and dwciri: terms to link out to related resources.

I can't decide if it is funny or sad that I (and I am guessing most collection managers) have absolutely no idea what this means and I wonder what implications that has for sharing the rich data available in natural history collections.

baskaufs commented 1 year ago

Well, I don't think collection managers necessarily need to understand these technical details. Presumably they would have either software to guide them, or examples/recipes to follow. However, in this space we are discussing the technical reasons why ID terms are or are not usable in particular situations and why alternatives might be required. So the discussion is apt to get technical.

I guess this kind of reminds me of a possibly apocryphal story about an incident that occurred soon after the comedian Sonny Bono got elected to congress. He apparently complained that this stuff was so complicated that you had to be a lawyer to understand it. One of his colleagues pointed out "well, we are actually writing laws...".

Jegelewicz commented 1 year ago

Presumably they would have either software to guide them, or examples/recipes to follow.

OK - you didn't say assume - but "presumably" isn't that different and you know what happens when we assume....

I get it and maybe I shouldn't be here.

baskaufs commented 1 year ago

I apologize for my tone in the previous, comment, Teresa. I was not trying to insinuate that anyone doesn't belong in the conversation.

The point I was trying to make was that we are modifying a technical specification and sometimes that's going to get complicated. Maintaining Darwin Core is particularly difficult because it is used in so many ways and therefore has to satisfy so many different kinds of users. Some people are only concerned with using it in spreadsheets, while others want to use it to create linked data or ontologies. So it's a balancing act to design it in such a way that it works for all use cases. A solution that seems obvious for a spreadsheet user may have unintended consequences for someone who wants to build a linked data system based on RDF. A solution that may seem obvious to the RDF/Linked data user may make things too complicated or unintelligible for spreadsheet users. We have to try to strike a balance between the two.

The reason we have a Technical Architecture Group is to be on the lookout for potential problems that might make what one group proposes cause problems for another. As the TAG chair, I have to be on the lookout for those kinds of problems and bring them up when I notice them. We have technical experts on the TAG to work out solutions so the rank and file don't have to struggle with them. That doesn't mean that the contributions of non-TAG people aren't important -- they are critically important to make sure that we are building something that's really useful to non-technical users.

I greatly value your contributions, both in terms of the ideas that you suggest as well as the heavy lifting that you do to keep the Material Sample Task Group together and moving. So I'm sorry if what I said made you feel excluded.

Jegelewicz commented 1 year ago

@baskaufs thank you for the apology, but this has happened quite frequently to me or others I am working with - not just in the TDWG space, but biodiversity data spaces in general. The span of technical knowledge seems too far to cross at this time. Maybe we should be working on closing that gap instead of worrying about what to call matter and how to describe it. I do wonder if my time wouldn't be better spent gaining the "law" knowledge, but I cannot get my head above water long enough to do so.

baskaufs commented 1 year ago

I understand your feelings on this. When I first started working in TDWG, all of this was foreign to me and it's taken me many years to get to the level of understanding I now have (painfully, with frustration and mostly by teaching myself).

I don't think every user should need to have a detailed knowledge of the technical details to USE the standards. I think it is recognized that we still have a way to go in helping bring people up to speed on what they actually need to know, although there has been work put into things like Darwin Core Hour presentations and user guides. But I don't think there is any getting around the fact that people involved in DEVELOPING the standards will have to face up to dealing with some of the technical details. Hopefully those who do have the technical background will be patient and helpful with those who don't, just as those with the biological domain knowledge will have to be patient and helpful with the tech people who don't have that background.

I don't want to minimize the problem that you raise, but I think there are some places where there's been a really great collaboration between technical and biological folks who've figured out how to work together to get things done. I've experienced this in the Humboldt Extension TG and I think it's similar in the Latimer Core TG. So I do have hope that if we keep trying and are patient, we can succeed in building useful vocabularies and tools.

Jegelewicz commented 1 year ago

added to review package - https://github.com/tdwg/material-sample/blob/main/review%20package/preparations.md

Jegelewicz commented 1 year ago

submitted to dwc repo - https://github.com/tdwg/dwc/issues/452

Jegelewicz commented 9 months ago

change complete - https://github.com/tdwg/dwc/issues/452