Open kitchenprinzessin3880 opened 4 years ago
Assumption: A data provider applies the harvested ontological terms as part of full-text search on the data portal. Requirements: R1. Term name, alternative name, vernacular name (species) R2. Term acronym/abbreviation (chemincal formula) R3. Term description R4. Term relations (e.g., we use skos-based relations), synonym
(my additional requirements, as usual, a little more into the details)
entities for ...
@SirkoS These are good additions. Perhaps clarify as follows: the object and compound line items make up the system being observed, while the kind of quantity is the quantitative property being measured/modeled/computed. A requirement should be to identify all elements of the observed system minimally necessary to understand the quantity recorded—for this case the concentration is measured, which is defined as the amount of A in B, so both A and B must be specified. Perhaps also add that a valid quantity should be accompanied by dimensions and a unambiguous type (more of a requirement for resource alignment) ... so, for example, specify mass concentration, M/L^3, vs volume concentration L^3/L^3.
I agree that those are important, but I feel we can easily get lost in details. So there should be extension points where we do not care about the details and just acknowledge that there is something important. From my perspective, the dimensions are such a think. I don't think it's important to the observation itself what is beyond the point of "kind of quantity" like dimensions or dimensional vectors. So for me, this would be something that is covered elsewhere, but not within our activities.
here are ontological terms observation 'types' used in pangaea data search
I have transferred the requirements listed here into the list of requirements tab of the spreadsheet so that we include them for the discussion tomorrow. I have condensed Sirko's suggestions into 4 requirements:
Not sure if "trait" deserves its own requirement neither. I think it may be part of the description of the system/object observed, in this case a biological object.
@gwemon I am not 100% sure, but I think trait came from US 16 to refer to ‘morpho-functional’ properties such as ‘shape’. If so, I think this would fall under the property category.
thanks @mariutzica. There are also cases when the observation is about an organism of a particular trait (this happens a lot in plankton sample analysis where the organism is not fully identified but it is described using characteristics of shape, colour, size categories). Also, organisms can be counted according to their biological sex category or life stage. In these cases, the trait would be needed as part of the observed object description. So it might be safer to keep traits separate indeed at least for the time being?
@gwemon I see, thanks for explaining! So 'traits' in this sense means the terms that indicate fixed values of properties (whether intrinsic or not) for the duration of a measurement/observation. In SVO we have a special class for this purpose called Attribute which is used to represent Property-Value pairs (i.e., every Attribute is associated with a Property) for values that are not presented in the data. We found this helpful because sometimes data is condensed or unpacked by these attributes (e.g., for one dataset particle size may be part of the terminology while in another it may be provided its own column in the dataset). Trait in this sense definitely deserves its own bullet point.
Notes transferred from Maria's document for UC9 keyword semantic data search (data discovery based on keywords that come from a controlled vocabulary):
Notes:
Requires concepts that have persistent resolvable URIs
Requires terminologies based on simple atomic terms
ADDED - Requires agreement of top-level, domain-independent categorization scheme/ontology
Requires that the relationships be trusted
Requires terminologies with coarse/fine granularity
Requires a long-term commitment governance setup
Requires an active community supporting the terminology
Requires reliable technical infrastructure
Requires input from domain experts
Requires that the terminology be part of federated community specific and/or cross-domain portals
Requires that the terminology supports multilingual terms
Requires multilingual editorial team or multilingual community effort
Requires terminologies published as linked data capabilities
Requires the terminology to use a common minimum metadata schema to describe semantic artefacts and their content
Requires mappings between terminologies
Summary of Maria's analysis for this UC:
Requirements identified in the spreadsheet (and open for discussion)
[automated message] Updated top entry of this issue on 2020-07-11
- Requires a long-term commitment governance setup
- Requires input from domain experts
Why are those needed?
- Requires that the terminology supports multilingual terms
- Requires alternative syntax (like alternative notation e.g. chemical formula, abbreviations)
- Requires synonym management
Not mandatory, but helpful.
Requires definitions of the concepts within a terminology
Maybe helpful for disambiguating search queries.
Agree with @SirkoS that the following: Requires a long-term commitment governance setup Requires input from domain experts may not be needed in order to support keyword semantic data search. Could somebody provide explanations supporting these?
Also agree about @SirkoS selection of useful others for this use case. But because they are not essential to the use case, I would pefer to leave them out. Does everybody still agree that at this stage we need to limit ourselves to only record "key" requirements, i.e. the requirements without which a use case cannot be addressed?
@gwemon i agree — these are not strictly needed for the search step; they are needed for the terminology maintenance/update step
[automated message] Updated top entry of this issue on 2020-09-21
Keyword semantic data search
Data discovery based on keywords that come from a controlled vocabulary
Corresponding user stories
Requirements identified in the spreadsheet (and open for discussion)
last updated: 2020-09-21