i-adopt / requirements

Repository used for Task 3 of the I-ADOPT workplan to define terminology requirements for given use cases
0 stars 1 forks source link

UC9 - Keyword semantic data search #6

Open kitchenprinzessin3880 opened 4 years ago

kitchenprinzessin3880 commented 4 years ago

Keyword semantic data search

Data discovery based on keywords that come from a controlled vocabulary

Corresponding user stories

Requirements identified in the spreadsheet (and open for discussion)

last updated: 2020-09-21

kitchenprinzessin3880 commented 4 years ago

Assumption: A data provider applies the harvested ontological terms as part of full-text search on the data portal. Requirements: R1. Term name, alternative name, vernacular name (species) R2. Term acronym/abbreviation (chemincal formula) R3. Term description R4. Term relations (e.g., we use skos-based relations), synonym

SirkoS commented 4 years ago

(my additional requirements, as usual, a little more into the details)

entities for ...

mariutzica commented 4 years ago

@SirkoS These are good additions. Perhaps clarify as follows: the object and compound line items make up the system being observed, while the kind of quantity is the quantitative property being measured/modeled/computed. A requirement should be to identify all elements of the observed system minimally necessary to understand the quantity recorded—for this case the concentration is measured, which is defined as the amount of A in B, so both A and B must be specified. Perhaps also add that a valid quantity should be accompanied by dimensions and a unambiguous type (more of a requirement for resource alignment) ... so, for example, specify mass concentration, M/L^3, vs volume concentration L^3/L^3.

SirkoS commented 4 years ago

I agree that those are important, but I feel we can easily get lost in details. So there should be extension points where we do not care about the details and just acknowledge that there is something important. From my perspective, the dimensions are such a think. I don't think it's important to the observation itself what is beyond the point of "kind of quantity" like dimensions or dimensional vectors. So for me, this would be something that is covered elsewhere, but not within our activities.

kitchenprinzessin3880 commented 4 years ago

here are ontological terms observation 'types' used in pangaea data search

  1. methods, instruments
  2. observed properties/quantities
  3. features (including specifies)
  4. we dont use vocabularies for unit as represent them using ucum convention
gwemon commented 4 years ago

I have transferred the requirements listed here into the list of requirements tab of the spreadsheet so that we include them for the discussion tomorrow. I have condensed Sirko's suggestions into 4 requirements:

Not sure if "trait" deserves its own requirement neither. I think it may be part of the description of the system/object observed, in this case a biological object.

mariutzica commented 4 years ago

@gwemon I am not 100% sure, but I think trait came from US 16 to refer to ‘morpho-functional’ properties such as ‘shape’. If so, I think this would fall under the property category.

gwemon commented 4 years ago

thanks @mariutzica. There are also cases when the observation is about an organism of a particular trait (this happens a lot in plankton sample analysis where the organism is not fully identified but it is described using characteristics of shape, colour, size categories). Also, organisms can be counted according to their biological sex category or life stage. In these cases, the trait would be needed as part of the observed object description. So it might be safer to keep traits separate indeed at least for the time being?

mariutzica commented 4 years ago

@gwemon I see, thanks for explaining! So 'traits' in this sense means the terms that indicate fixed values of properties (whether intrinsic or not) for the duration of a measurement/observation. In SVO we have a special class for this purpose called Attribute which is used to represent Property-Value pairs (i.e., every Attribute is associated with a Property) for values that are not presented in the data. We found this helpful because sometimes data is condensed or unpacked by these attributes (e.g., for one dataset particle size may be part of the terminology while in another it may be provided its own column in the dataset). Trait in this sense definitely deserves its own bullet point.

gwemon commented 4 years ago

Notes transferred from Maria's document for UC9 keyword semantic data search (data discovery based on keywords that come from a controlled vocabulary):

gwemon commented 4 years ago

Summary of Maria's analysis for this UC:

gwemon commented 4 years ago

Requirements identified in the spreadsheet (and open for discussion)

SirkoS commented 4 years ago

[automated message] Updated top entry of this issue on 2020-07-11

SirkoS commented 4 years ago
  • Requires a long-term commitment governance setup
  • Requires input from domain experts

Why are those needed?

  • Requires that the terminology supports multilingual terms
  • Requires alternative syntax (like alternative notation e.g. chemical formula, abbreviations)
  • Requires synonym management

Not mandatory, but helpful.

Requires definitions of the concepts within a terminology

Maybe helpful for disambiguating search queries.

gwemon commented 4 years ago

Agree with @SirkoS that the following: Requires a long-term commitment governance setup Requires input from domain experts may not be needed in order to support keyword semantic data search. Could somebody provide explanations supporting these?

gwemon commented 4 years ago

Also agree about @SirkoS selection of useful others for this use case. But because they are not essential to the use case, I would pefer to leave them out. Does everybody still agree that at this stage we need to limit ourselves to only record "key" requirements, i.e. the requirements without which a use case cannot be addressed?

mariutzica commented 4 years ago

@gwemon i agree — these are not strictly needed for the search step; they are needed for the terminology maintenance/update step

SirkoS commented 4 years ago

[automated message] Updated top entry of this issue on 2020-09-21