Closed tschleider closed 3 years ago
I think this is a follow up of #8, solved in e40223b
Both your examples seem to be correctly captured by the dimension regex. What is the output of the current code?
In the first example (O10267.json
): the values in inches are just the same than in cm but converted in a different unit. Between, how do we manage units for dimensions in the KG?
In the second example (O10418.json
): the interesting keyword is repeat
. A string2vocabulary call should be made and match https://data.silknow.org/vocabulary/444! This SKOS concept should be the trigger to an additional path creation in the KG as it means a new instance of the T24_Pattern_Unit class (see also the comment on page 25 in this doc)
@tschleider Can you indicate in this github issue the URI of those 2 E22 in the KG.
Between, how do we manage units for dimensions in the KG?
CIDOC-CRM offers a P91 has unit
property, used e.g. here:
https://data.silknow.org/object/622582e4-39aa-3888-bc68-3568995c0e1c/dimension/w
Good, but this is from a representation point of view. We do not attempt to convert all dimension values in a common unit? Consequently, we would not be able to do a query filter by the dimension size in a straightforward manner, right?
Indeed this is a failure point. Being the number of used units is limited, we can aim to convert everything in cm.
@pasqLisena : You're right, I should at least have referred to #8 , which has fixed a lot.
@rtroncy: Thanks for finishing the explanation, here are the two URIs:
Production : O10267.json | O10418.json
Pre-prod: O10267.json | O10418.json
Indeed this is a failure point. Being the number of used units is limited, we can aim to convert everything in cm.
It clearly is. And worst, we are collapsing values and unit, so, if I look at http://data.silknow.org/object/7367a77e-72fc-3b0a-a85a-2aad7289bf95/dimension/l, I have two values (13,75 and 34) and 2 units (cm and in) so we do not know what is expressed in what! This should be fixed.
I'm following up as well on the mailing list (message) for better understanding what usage of the dimension we are foreseeing in ADASilk. I can imagine that this can become a sorting criteria (from the biggest to the smallest object).
So these are the units extracted so far: | unit | count | what |
---|---|---|---|
cl | 1 | volume | |
repe | 2 | ERROR: should be "repeated" | |
ft | 18 | length | |
in | 3324 | length | |
mini | 1 | ERROR: should be "minimum" | |
troy | 1 | weight | |
mm | 1199 | length | |
kg | 310 | weight | |
a | 1 | ERROR: has to be ignored | |
cm | 58136 | length | |
maxi | 2 | ERROR: should be "maximum" | |
m | 16 | length | |
lb | 1 | weight | |
g | 2 | weight | |
bott | 1 | ERROR: should be "bottom" |
I would use: cm
for length, kg
for weight and cl
(unique) for volume
There are also cases like this: https://ada-preprod.silknow.org/describe/?url=http%3A%2F%2Fdata.silknow.org%2Fobject%2Fa00e8e99-b858-3cc1-9d2d-c40cf91180e5%2Fdimension%2Fw So that 3 widths have to be represented (with different URIs): min, max and border
Identically for "width" and "weight" which cannot share the same letter: https://ada-preprod.silknow.org/describe/?url=http://data.silknow.org/object/39658ae7-c3e1-3cf1-a115-ff5528f9a369/dimension/w
I am planning a 2nd round of development in the coming days:
object/UUID/dimension/1
I did some modifications.
Current output for O10267:
<http://data.silknow.org/object/7367a77e-72fc-3b0a-a85a-2aad7289bf95/dimension/1>
a ecrm:E54_Dimension ;
rdfs:label "Length: 34 cm" ;
ecrm:P2_has_type "length" ;
ecrm:P90_has_value "34"^^xsd:float ;
ecrm:P91_has_unit "cm" .
<http://data.silknow.org/object/7367a77e-72fc-3b0a-a85a-2aad7289bf95/dimension/2>
a ecrm:E54_Dimension ;
rdfs:comment "open" ;
rdfs:label "Width: 66.6 cm open" ;
ecrm:P2_has_type "width" ;
ecrm:P3_has_note "open" ;
ecrm:P90_has_value "66.6"^^xsd:float ;
ecrm:P91_has_unit "cm" .
<http://data.silknow.org/object/7367a77e-72fc-3b0a-a85a-2aad7289bf95/dimension/3>
a ecrm:E54_Dimension ;
rdfs:label "Length: 13.75 in" ;
ecrm:P2_has_type "length" ;
ecrm:P90_has_value "34.925"^^xsd:float ;
ecrm:P91_has_unit "cm" .
Output for O10418
<http://data.silknow.org/object/df126873-b14d-325d-93ad-2e79a14c1730/dimension/1>
a ecrm:E54_Dimension ;
rdfs:label "Length: 110 cm" ;
ecrm:P2_has_type "length" ;
ecrm:P90_has_value "110"^^xsd:float ;
ecrm:P91_has_unit "cm" .
<http://data.silknow.org/object/df126873-b14d-325d-93ad-2e79a14c1730/dimension/2>
a ecrm:E54_Dimension ;
rdfs:label "Width: 5.5 cm" ;
ecrm:P2_has_type "width" ;
ecrm:P90_has_value "5.5"^^xsd:float ;
ecrm:P91_has_unit "cm" .
<http://data.silknow.org/object/df126873-b14d-325d-93ad-2e79a14c1730/dimension/3>
a ecrm:E54_Dimension ;
rdfs:comment "repeat" ;
rdfs:label "Length: 44.5 cm repeat" ;
ecrm:P2_has_type "length" ;
ecrm:P3_has_note "repeat" ;
ecrm:P90_has_value "44.5"^^xsd:float ;
ecrm:P91_has_unit "cm" .
(note that I keep now the original string in rdfs:label
)
The URI system has been only changed for VAM. For others, it still has w
and h
(no problem of overwriting)
What do you think?
This is good, I would apply the same URI pattern for all dimension, so the pattern http://data.silknow.org/object/[UUID]/dimension/[count]
The parsing part can be seen as completed.
What is still missing is the connection with the Patterns:
In the second example (
O10418.json
): the interesting keyword isrepeat
. A string2vocabulary call should be made and match https://data.silknow.org/vocabulary/444! This SKOS concept should be the trigger to an additional path creation in the KG as it means a new instance of the T24_Pattern_Unit class (see also the comment on page 25 in this doc)
@tschleider you take the token from here on?
I'm almost done with the patterns, see #74
One potential issue I'm seeing is that now, instances of the E54_Dimension
class can be found:
E22_Man-Made_Object
: http://data.silknow.org/object/[UUID]/dimension/[int]T24_Pattern_Unit
: http://data.silknow.org/object/[UUID]/pattern/[int]/dimension/[int]Correct? In any case, the URI Patterns really needs to be updated!
Correct. If this is a problem, what could be a solution?
I'll update the URI policy
It is not necessarily an issue if this is clear that dimensions of objects are different than dimensions of pattern unit and if dimensions are never primary entities.
Yes, that's the case, I'll update the pattern policy file and will close this issue
Copied the section about the two possible E54_Dimension pattern into the URI policy. Therefore this issue here can be closed.
The dimensions field is right now not fully parsed if there is more information than just one height and one width.
Examples:
O10267.json: dimensions: "Length: 34 cm, Width: 66.6 cm open, Length: 13.75 in, Width: 26.5 in open
O10418.json: dimensions: "Length: 110 cm, Width: 5.5 cm, Length: 44.5 cm repeat"