w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
123 stars 19 forks source link

Easier addition of support for custom datatypes to SPARQL endpoints #130

Open JervenBolleman opened 3 years ago

JervenBolleman commented 3 years ago

Why?

Currently, most stores require significant work to add new data-types. e.g. anything beyond the inbuild XSD types requires custom code. This makes it more difficult to use datatypes as a significant determinant of meaning.

This come up as part of a solution to issue #129

Previous work

Matching of arbitrary data-types in the earlier Specifications

Proposed solution

A RDF file that lists datatypes and can be read by stores.

unit:K rdfs:subClassOf xsd:decimal ;
  rdfs:label "Kelvin"@en ;
  rdfs:comment "SI Unit for temperature" .

A file such as this adds datatype definitions and allows the mathematical functions of xsd:decimal be called with this value (sameTerm("273.1"^^unit:K + "100"^^unit:K, "373.1"^^unit:K).

Allowing casts and additions xsd numerics allows for more convenient math operations in the queries.

{
 ?x ex:temperatureMeasurement ?tempInKelvin .
 FILTER(datatype(?tempInKelvin) = unit:K)

 # adding by xsd  numerics preserves type
 BIND(?tempInKelvin + 273.15) AS ?firstStepToCelsius)
 FILTER(datatype(?firstStepToCelsius) = unit:K)

 # adding by xsd  numerics preserves type
 BIND(xsd:decimal(?firstStepToCelsius) AS ?tempInCelsiusDecimal)
 FILTER(datatype(?tempInCelsiusDecimal) = xsd:decimal)

 # adding by xsd  numerics preserves type, cast from one datatype to another must be from
 # an xsd numeric. (Custom conversion functions are a different issue)
 BIND(unit:degC(tempInCelsiusDecimal) AS ?tempInDegreeCelsius)
 FILTER(datatype(?tempInDegreeCelsius) = unit:degC)
}

Some times, custom datatypes should extend xsd:string as appropriate.

iupac:DNA rdfs:subClassOf xsd:string ;
  rdfs:label "DNA"@en ;
  rdfs:comment "An representation of a DNA sequence in encoded in IUPAC spec" .

Considerations for backward compatibility

More data in the wild will be inconvenient to use in SPARQL 1.1. endpoints.

kasei commented 3 years ago

Would the use of subClassOf imply that this would be considered a derived type of decimal? I think it would be really strange to be able to do things like type-promote between Kelvin and decimal, or subtract an integer value from a Kelvin value.

VladimirAlexiev commented 3 years ago

How can I express the relation between Kelvin and Celsius?

Or between meter and cm?

--

LINDT shows an example if declaring a datatype and implementing it in JS (see #129). The real LINDT is implemented in Java.

ericprud commented 3 years ago

Interestingly, I'm not sure °C and °K are more related than °C and °F.

supportedUnits

At a minimum, we could add a sd:supportedUnits property to the SPARQL Service Description spec. That would allow clever clients to tailor their queries to whatever the remote endpoint would support.

There's another dimension: which operators are supported. XPath provides names for e.g. lessThan, so a service description might look like:

[] a sd:Service ;
    sd:endpoint <http://www.example/sparql/> ;
    sd:supportedUnits # RDF representation of https://www.w3.org/TR/sparql11-query/#OperatorMapping
      [ sd:left ucum:m ; sd:function op:numericEquals ; sd:right ucum:ft_i ],
      [ sd:left ucum:ft_i ; sd:function op:numericEquals ; sd:right ucum:m ]
      # ...
    .

It's kinda tedious to have to write both of those, but maybe we can't assume symmetry.

unitConversions

The above wouldn't enable clever servers to do automagic conversion. We could pick some base units, e.g. MKS, and have a linear function to capture the mapping a la:

[] a sd:Service ;
    sd:endpoint <http://www.example/sparql/> ;
    sd:unitConversions u:Length, u:Mass, u:Time
    .

and centrally maintain the mappings:

u:Length u:baseUnit ucum:m ;
  u:conversion u:Foot, u:Smoot . # ...
u:Mass uLbaseUnit ucum:kg ;
  u:conversion u:Gram, u:Ton, u:LongTon, u:ShortTon, u:Tonne . # ......

u:Foot a u:conversion ; u:factor 0.3048 ; u:offset 0.0 .
u:Smoot a u:conversion ; u:factor 1.7 ; u:offset 0.0 .
...
u: Fahrenheit a u:conversion ; u:factor .555 ; u:offset -17.77 . # assuming offset follows factor.
JervenBolleman commented 3 years ago

@VladimirAlexiev @ericprud Easier support of conversion is easier sharing of custom functions/named queries, for which there are few issues already. I wanted to separate out sub parts of the problem to discuss one facet at a time.

@kasei I am editing the issue to expand the thought behind rdfs:subClassOf

afs commented 3 years ago

Should the title be something like:

"Extend SPARQL Service Description to allow declaration of the supported datatypes"

?

JervenBolleman commented 3 years ago

@afs no, that is not what I was going for. I was going for a declarative system to declare what properties and operators a new datatype has. i.e. to extends https://www.w3.org/TR/sparql11-query/#matchArbDT. e.g. declare that a new datatype has greater than operator and how that works (in collaboration with issue #131 ) as well as how it can be cast/converted to a different datatype.

VladimirAlexiev commented 3 years ago

@JervenBolleman "declare what properties and operators a new datatype has" @ericprud "RDF representation of https://www.w3.org/TR/sparql11-query/#OperatorMapping"

I agree these would be very useful features.


@maximelefrancois86 and @Antoine-Zimmermann have proposed:


The standards (esp OWL2) have a lot on datatypes:


Can we try to flesh out a list of requirements? Eg

jmkeil commented 3 years ago

Hi. I like the idea and would like to add a requirement for consideration:

Requirement: Unambiguous definition of conversion values

In an evaluation of several unit ontologies, we identified multiple cases of wrong conversion values caused by mixing up the direction of factor and offset. This mistakes have been made by the people who defined the property. In a standard with wide application, this is even more critical. In fact, factor and offset allow four possible interpretations:

a = b × factor + offset
a = (b + offset) × factor
a × factor + offset = b
(a + offset) × factor = b

To take the example by @ericprud:

u:Fahrenheit a u:conversion ; u:factor .555 ; u:offset -17.77 . # assuming offset follows factor.

could be less ambiguously expressed e.g. in the following way:

u:Fahrenheit u:oneEquals    ".55555556"^^u:degreeCelsius ;
u:zeroAt    "-17.77777778"^^u:degreeCelsius .

or

u:Fahrenheit u:oneEquals    .55555556 ;
u:zeroAt    -17.77777778 ;
u:of        u:degreeCelsius .

or similar.

steveraysteveray commented 3 years ago

Please take a look at this link https://github.com/qudt/qudt-public-repo/wiki/Support-for-measures-of-absolute-values-and-for-intervals-(differences)#converting-absolute-values for how I believe unit conversion with offsets should always be calculated. Your examples above seem to only have one offset value which I find confusing.

Steve

On Tue, Feb 2, 2021 at 8:02 AM Jan Martin Keil notifications@github.com wrote:

Hi. I like the idea and would like to add a requirement for consideration: Requirement: Unambiguous definition of conversion values

In an evaluation of several unit ontologies http://www.semantic-web-journal.net/system/files/swj1825.pdf, we identified multiple cases of wrong conversion factors caused by mixing up the direction of factor and offset. This mistakes have been made by the people who defined the property. In a standard with wide application, this is even more critical. In fact, factor and offset allow four possible interpretations:

a = b × factor + offset

a = (b + offset) × factor

a × factor + offset = b

(a + offset) × factor = b

To take the example by @ericprud https://github.com/ericprud:

u:Fahrenheit a u:conversion ; u:factor .555 ; u:offset -17.77 . # assuming offset follows factor.

could be less ambiguously expressed e.g. in the following way:

u:Fahrenheit u:oneEquals .55555556^^u:degreeCelsius ;

         u:zeroAt    −17.77777778^^u:degreeCelsius .

or

u:Fahrenheit u:oneEquals .55555556 ;

         u:zeroAt    −17.77777778 ;

         u:of        u:degreeCelsius .

or similar.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/sparql-12/issues/130#issuecomment-771741209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIT5TKJZ5ZZDE4VWFOSHZDS5AOYPANCNFSM4TNCVRSA .

JervenBolleman commented 3 years ago

Unit's are not the only datatypes to consider. One I would find very nice to have is conversion between dna/rna plus forward reverse strands etc. in the biological sphere.

ericprud commented 3 years ago

With linear numeric units, we can define oneEquals and zeroAt per @jmkeil's proposal, which allows a naive, generic engine to handle these with no unit-specific code. We won't achieve that for e.g. your example of converting thymine to uracil or mapping astronomical coordinates, but I think we can still leverage datatypes and operator mappings to advertise capabilities and perform rudimentary unit analysis.

jmkeil commented 3 years ago

Please take a look at this link https://github.com/qudt/qudt-public-repo/wiki/Support-for-measures-of-absolute-values-and-for-intervals-(differences)#converting-absolute-values for how I believe unit conversion with offsets should always be calculated. Your examples above seem to only have one offset value which I find confusing. Steve

Yes, there is a difference between absolute values and intervals. But I don't see which information (conversion offset, conversion multiplier) is missing to do both. (I must confess - the property name "oneEquals" does not fit very well to absolute value conversion.) Of course, it would be good practice to define all units in reference to SI base units and I would expect an implementation to combine at least two conversion definitions for conversion between not directly connected units.

One thing, what is missing, is the information when to apply which conversion (absolute or interval). One solution I could think of, is the definition of two datatypes (e.g. u:degreeCelsiusAbsolute and u:degreeCelsiusInterval). However, this raises new problems on the definition of basic calculations:

steveraysteveray commented 3 years ago

Quoting from the QUDT wiki,

"To support this, the qudt:Quantity class has a property qudt:isDeltaQuantity. It is associated with the qudt:Quantity because the quantity describes the context of the measurement. qudt:isDeltaQuantity is a boolean property to record whether the measurement is an absolute value of the Quantity instance, or a delta (or difference) value. Setting isDeltaQuantity to "true" means the measurement is an interval. isDeltaQuantity set to "false" means the measurement is an absolute value. An application can then take the appropriate action, such as in unit conversion, etc.

It should be noted that in these cases, the unit is still the same unit on the same scale. There is nothing special about the unit."

jmkeil commented 3 years ago

Quoting from the QUDT wiki,

"To support this, the qudt:Quantity class has a property qudt:isDeltaQuantity. It is associated with the qudt:Quantity because the quantity describes the context of the measurement. qudt:isDeltaQuantity is a boolean property to record whether the measurement is an absolute value of the Quantity instance, or a delta (or difference) value. Setting isDeltaQuantity to "true" means the measurement is an interval. isDeltaQuantity set to "false" means the measurement is an absolute value. An application can then take the appropriate action, such as in unit conversion, etc.

It should be noted that in these cases, the unit is still the same unit on the same scale. There is nothing special about the unit."

That works out, if a quantity-value is represented using an individual (represented by IRI or blank node) of a class and with several properties. To represent them as a literal (e.g. "37"^^u:degreeCelsius), it isn't an option to add further property).

VladimirAlexiev commented 3 years ago

@jmkeil @steveraysteveray @ericprud please make a separate issue for discussing conversions (and take a look at the LINDT issue here that's closely related).

Otherwise your valuable comments will be lost in this issue, which is about a different (though related) topic.

VladimirAlexiev commented 3 years ago

SciSPARQL is a specialized implementation that includes matrices and tensors and I think will pose strong requirements on this issue. Eg see https://ieeexplore.ieee.org/document/6313648 by Andrej Andrejev and Tore Risch. Does anyone know how to contact them?