Data round tripping issue (not enough precision)

lanthaler commented 12 years ago

We currently specify in the API spec that doubles should be normalized using the C-equivalent %1.6e. That is wrong as it doesn't result in a number with double precision and wouldn't even be able to represent a number like 100000.123.

I didn't have the time yet to investigate how this can be solved, but I found at least one source (apart from Wikipedia) that says that in JavaScript all numbers are 64 bits.

We should also add some test cases for these round tripping issues.

gkellogg commented 12 years ago

Agreed, I haven't been too keen on %1.6e either. In the Ruby RDF library, we canonicalize using %1.16E. The Turtle/N3 tests check the following:

     "2.234000005"                         => "2.234000005",
     "2.2340000000000005"             => "2.2340000000000005",
     "2.23400000000000005"            => "2.234",
     "2.23400000000000000000005"      => "2.234",
     "1.2345678901234567890123457890" => "1.2345678901234567",

jmandel commented 12 years ago

Re: javascript numbers...

Twitter discusses one consequence (+ stopgap of sorts): https://dev.twitter.com/docs/twitter-ids-json-and-snowflake Geneal discussion: http://stackoverflow.com/q/307179 Reference: http://ecma262-5.com/ELS5_HTML.htm#Section_8.5

lanthaler commented 12 years ago

RESOLVED: Convert all values coerced to xsd:double to strings using the C-syntax formula of "%1.16e".

lanthaler commented 12 years ago

While updating the spec I came to think that it might be better to normalize the value to xsd:double's canonical form which is different from "%1.16e". Would like to hear the opinion of others before making that change.

gkellogg commented 12 years ago

I think that's what my Ruby canonicalization does. This is from the spec: http://www.w3.org/TR/xmlschema-2/#double.

The canonical representation for double is defined by prohibiting certain options from the Lexical representation (§3.2.5.1). Specifically, the exponent must be indicated by "E". Leading zeroes and the preceding optional "+" sign are prohibited in the exponent. If the exponent is zero, it must be indicated by "E0". For the mantissa, the preceding optional "+" sign is prohibited and the decimal point is required. Leading and trailing zeroes are prohibited subject to the following: number representations must be normalized such that there is a single digit which is non-zero to the left of the decimal point and at least a single digit to the right of the decimal point unless the value being represented is zero. The canonical representation for zero is 0.0E0.

gkellogg commented 12 years ago

So, RDF.rb does use %.16E, but that ends up creating problems, at least in Ruby. Here are a couple of examples:

1.9.3p125 :001 > ("%.16E" % 5.3.to_f)
 => "5.2999999999999998E+00" 
1.9.3p125 :002 > ("%.15E" % 5.3.to_f)
 => "5.300000000000000E+00"

I needed to back down my version to %.15E to make it work properly; of course, you also need to remove extraneous 0's and +'s.

gkellogg commented 12 years ago

Okay, we said

Convert all values coerced to xsd:double to strings using the C-syntax formula of "%1.16e". what do the following convert to?

"foo", true, "2012-03-25", {"@value": false}, {"@value": "2012-03-25", "@type": "xsd:date"}

I'd say that what we mean is to convert all numeric types to %1.16E (or whatever we settle on), and not attempt coercion of other types other than to turn them into a string contained in "@value". I suggest the following

"foo" => {"@value": "foo", "@type": "xsd:double"} true => {"@value": "true", "@type": "xsd:double"} "2012-03-25" => {"@value": "foo2012-03-25, "@type": "xsd:double"} {"@value": false} => {"@value": "false", "@type": "xsd:double"} {"@value": "2012-03-25", "@type": "xsd:date"} => {"@value": "2012-03-25", "@type": "xsd:double"}

gkellogg commented 12 years ago

It does seem clear that double to double is always transformed:

1.1 => {"@value": "1.1E0", "@type": "xsd:double"}

dlongley commented 12 years ago

I was thinking that we'd handle expansion like this:

If the value is already in an expanded form, leave it alone regardless of what the context says. Only replace keyword aliases.

If the value is anything else, determine what to do to it based on its property's context mapping. If the property isn't in the context and the property isn't an absolute IRI, drop the value (do not include it in the output). If the property is '@id' or '@type' (or a keyword alias for these) then expand the IRI according to the context. If the property is anything else with a @type mapping, then convert the value into a string and put it into a @value object (eg: "foo" => {"@value": "foo", "@type": "anytypewhatsoever"}).

Therefore, it doesn't matter if the coerce @type is xsd:double or anything else, you simply convert to a string and put the value into a @value construct with a @type. That means that I agree with all of your examples except for the already-expanded ones (which I think we should leave alone).

We should effectively treat @type as opaque when expanding/compacting. I think the only time we should use the "%1.16e" formatting is with normalization. When we see a native type during normalization we should do:

If the value is a number with a decimal point,output: {"@value": (result of printf("%1.16e", value)): "@type": "full iri for xsd:double"}}. If the value is a number without a decimal point, output: {"@value": "string form of the number": "@type": "full iri for xsd:integer"}}. If the value is a boolean, output: {"@value": "true/false": "@type": "full iri for xsd:boolean"}}

dlongley commented 12 years ago

I'll add to this that you might get a different output from normalization for a double than from expansion; but I don't think we should really care about that. You should not be depending on doubles for any sort of accuracy or consistency, and you already can't depend on how they'll be modified by most JSON parsers when they are read in.

msporny commented 12 years ago

PROPOSAL: When performing datatype coercion while expanding: 1) leave all values already in expanded form alone, 2) if the property isn't in the context and is not an absolute IRI - do not include the value in the output, 3) if the property is '@id' or '@type' (or a keyword alias for these) then expand the IRI according to the context, 4) if the property is anything else with a @type mapping, then convert the value into a string and put it into a @value object (eg: "foo" => {"@value": "foo", "@type": "anytypewhatsoever"}).

PROPOSAL: When performing normalization and converting to expanded form: 1) if the value is a number with a decimal point, output: {"@value": (result of printf("%1.16e", value)): "@type": "full iri for xsd:double"}}, 2) if the value is a number without a decimal point, output: {"@value": "string form of the number": "@type": "full iri for xsd:integer"}}, 3) if the value is a boolean, output: {"@value": "true/false": "@type": "full iri for xsd:boolean"}}.

gkellogg commented 12 years ago

1) As I pointed out above, %1.16e creates rounding errors in Ruby. I'd suggest %1.15e as being safe. Also, values with [eE] should also be converted.

Also, normalization isn't in JSON-LD as an algorithm. Perhaps you mean just when performing expansion, which is used by most every other API method anyway.

lanthaler commented 12 years ago

I think we shouldn't do anything like this in expansion but solely rely on the JSON processor's native toString-method. We can't assure any precision anyway.

How this is solved for normalization is out-of-scope for the JSON-LD spec.

dlongley commented 12 years ago

I think the reason for requesting 15 digits of precision is that nearly all modern systems should be able to provide that amount -- and it would help mitigate against criticisms of the spec. It's a fairly esoteric issue, IMO, and I agree that we can't assure arbitrary precision, but assuring the lowest common denominator for precision could ward off certain criticisms that would otherwise hold up the process.

lanthaler commented 12 years ago

Deleted as I posted it to the wrong issue - sorry

lanthaler commented 12 years ago

Honestly, I don't like the idea of restricting the precision of all implementations just because it's the lowest common denominator (at the moment). There's no problem within JSON-LD it just applies to toRDF/fromRDF (and normalization). In toRDF/fromRDF I would expect to be able to use the full precision of my JSON processor and not be restricted by the JSON-LD spec, the situation in normalization is a bit different as we really need a canonical representation there.

Contrary to you, I believe the (somewhat arbitrary) 15 digits precision restriction is what will cause criticism and not the fact of relying on the underlying JSON processor for these conversions.. something we do and have to do anyway.

lanthaler commented 12 years ago

RESOLVED: When converting toRDF(), any value that is a JSON number that has a fractional value MUST be treated as an xsd:double using the printf("%1.15E", number) representation.

lanthaler commented 12 years ago

If we decide to introduce the flags Dave proposed in #100 we probably should use the printf("%1.15E", number) representation only if numbers with fractions are really mapped to xsd:double (which would be the default).

dlongley commented 12 years ago

If we decide to introduce the flags Dave proposed in #100 we probably should use the printf("%1.15E", number) representation only if numbers with fractions are really mapped to xsd:double (which would be the default).

I'd be fine with that if everyone else agrees. I don't yet have as strong of an opinion on this as others might.

gkellogg commented 12 years ago

Fine with me.

lanthaler commented 12 years ago

I created issue #150 for the flags Dave proposed in #100.

json-ld / json-ld.org

Data round tripping issue (not enough precision) #81