Open VladimirAlexiev opened 7 months ago
and refers to https://github.com/kg-construct/rml-core/issues/87. I don't see much discussion of datatype inference there, so I'm posting this issue here.
In that issue there's a discussion on where the test cases must be as the data type extraction from the data sources like SQL is mentioned in the Core spec while it might be better in the IO spec.
For XML, we should clearly use xsd:type, especially focusing on XSD Datatypes but not ignoring custom datatypes like geo:wktLiteral, geo:gmlLiteral etc.
I agree here. The question is how implementation should extract this given that XML can have separate XSD schemas etc.
for JSON, keep in mind that it does not define what is a number, which leads to a number of unpleasant surprises in JSON-LD. Eg 12345678901234567890 is not a xsd:integer, and small decimals like 12.3 can be treated as float/double (eg 1.23e1) at will. So I'm not sure what can be tested here.
Interesting... I wonder why we cannot indicate a number as double int for integers and doubles for floating point numbers? JSON has a native number type, but maybe it does not differentiates between float/integer here?
@DylanVanAssche Correct: JSON has just "number".
and refers to kg-construct/rml-core#87. I don't see much discussion of datatype inference there, so I'm posting this issue here.
Yes, the discussion is somewhat hidden, but "natural mapping of values" is definitely being discussed. The proposed plan is to introduce separate documents per reference formulation wherein this can be specified.
See:
Here are a couple of considerations:
For XML, we should clearly use
xsd:type
, especially focusing on XSD Datatypes but not ignoring custom datatypes like geo:wktLiteral, geo:gmlLiteral etc.
- XML attributes and text content are always strings, so there's no place for implicit types, right?
- One can specialize XSD types using restrictions and extension, which is potentially mappable to rdfs:Datatype constructs, but I think this is clearly beyond scope of RML
- XSD and RelaxNG have the concept of "post schema validation infoset" (PSVI) that can assign application types (eg Person) to elements. However, I don't think we should go there.
- for JSON, keep in mind that it does not define what is a number, which leads to a number of unpleasant surprises in JSON-LD. Eg 12345678901234567890 is not a xsd:integer, and small decimals like 12.3 can be treated as float/double (eg 1.23e1) at will. So I'm not sure what can be tested here.
Thanks for this. Once we have specified this it would be great to have some review from you and other experts in the community on this @VladimirAlexiev.
@bjdmeest Shouldn't this be moved to rml-io-registry?
A recent paper:
and refers to https://github.com/kg-construct/rml-core/issues/87. I don't see much discussion of datatype inference there, so I'm posting this issue here.
Here are a couple of considerations:
xsd:type
, especially focusing on XSD Datatypes but not ignoring custom datatypes like geo:wktLiteral, geo:gmlLiteral etc.