w3c / automotive

W3C Automotive Working Group Specifications
Other
146 stars 68 forks source link

Why strings for all datatypes? #386

Closed SebastianSchildt closed 2 years ago

SebastianSchildt commented 3 years ago

That probably has been discussed in the past, but as the question came up in https://github.com/GENIVI/iot-event-analytics/issues/109 and I could not find a rationale in the spec, I will just repost the main points here.

VISSv2 says "Regardless of its data type, a single data item is always represented as a string in message payloads."

can you give some reason, why string is defined in the standard?

and

I agree, that JSON just supports numbers [+string, boolean, array, object] and thus additional checking is mandatory. But clamping or testing out upper lower bounds etc. for numbers can be done using these plain numbers. Transforming strings to numbers and vice versa is actually an additional step. I don't find it cleaner to use strings. I always prefer e.g. comparing boolean to boolean, numbers with numbers... since I know many cases in where people are discussing whether "1", "true", "True", "TRUE", "yes" should be translated to boolean true. That's a pitfall which comes by using stringified values. The same applies to numbers (using "," or "." as a thousand or decimal separator because of English or German is a nightmare to parse) And if they want to use JSON as their representation of choice, they have to deal with the "limitations" of this format and not making everything on top super complicated to compensate for that. In the end, the whole thing gets complicated without no reason. In addition, strings tend to be misused for "weirdo-datatypes": First byte means this, second bytes this and then, separated by an ampersand there's the value. And don't even try to do another encoding than UTF-8, otherwise you break it. I know, it's a bit provoking, but I hope you get my point.

Can somebody shed some light why the design decision was to use strings for everything?

UlfBj commented 3 years ago

The VISSv2 spec deals with the signals from that they are delivered by the underlying vehicle system to the VISSv2 server, and until they are delivered to the VISSv2 client. It is not so likely that signal processing is to be done during the period that VISSv2 "rules" over the signal format. A possible conclusion from that is that any format, e. g. the "native" format of the signal, would be fine, as well as a common format such as string format. However, from the VISSvs reference implementation (https://github.com/MEAE-GOT/W3C_VehicleSignalInterfaceImpl) it has been found that intermediate data base storage is likely both before entering the VISSv2 realm (statestorage), after leaving it (OVDS), and also during (historic value capturing). The DB implementation is simplified by having a common storage format. The cost of a common format is the translation at the "spec front-ends". To translate from native to string is trivial, the other way can either be done by accessing the data type from the VSS tree, or a simple ad-hoc analysis (used in the ref impl). If the spec clearly defines the text format to be used for the different data types, ambiguity as mentioned in https://github.com/GENIVI/iot-event-analytics/issues/109#issuecomment-867548866 should not be a problem.

SebastianSchildt commented 3 years ago

I am not 100% sure that answers the question, or is a strong argument.

I still mainly see VISSv2 as a "protocol specification", i.e. it tells me what I can send, and what to expect to receive. Most of the time the goal of such specifications is to enable some interoperability, e.g. when I write a VISS client, I expect it to work with a VISS server, independent of which concrete server implementation it is (I do not even need to know).

So when I transfer a datapoint that VSS says is "double", and VISS says, those points are all encoded as string, that is certainly doable, but I would need to now

I do think the argument

Of course it is possible to encode numbers as strings no problem, I mean that is what JSON is doing,when using a number, but in VISS I feel there is a missing reference HOW things are encoded to (JSON) strings, as there are different ways, and it should not be randomly according to which runtime I happen to choose to develop my code.

Actually the "bool" example brought up by the IoTEA colleague above is a good example: We exactly have this kind of code in kuksa.val . I think we accept true and false, so in case the ref impl would also accept True there would be an incompatibility right there, without anybody being "wrong".

As a mere "user" I do not really advocate reversing the "everything is a string" decision in VISS (although I would be fine with it), but rather want to point out, that currently VISS seems underspecified in that regard to me, i.e. you can easily build 2 fully compliant implementations that are not necessarily interoperable.

So to conclude, I fully agree to your point

If the spec clearly defines the text format to be used for the different data types ambiguity [...] should not be a problem.

But I feel, that is not the case for all possible VSS datatypes currently, or I missed it.

UlfBj commented 3 years ago

You are right, the spec is currently under-specified regarding valid data type transformations.

crea7or commented 3 years ago

Can you just declare in documentation that "c locale" is only valid numbers representation? This is pretty acceptable since the most automotive software written in c/c++ and std::to_chars/std::from_chars are very fast to parse/build numbers. Personally I'm not a fan of "strings everywhere", mostly because it limits use of internal protocols that may work between vss server and tree to avoid unnecessary string/number translations. But I more confused about inconsistency of this rule. VSS, for example, defines max/min/default values as numbers in json tree. When strings will be truly everywhere, we will able to use one code to validate vss tree and vss data received over network.

UlfBj commented 3 years ago

Can you just declare in documentation that "c locale" is only valid numbers representation?

Do you have a link to this definition?

crea7or commented 3 years ago

c locale (actually is a default POSIX locale), take a look to paragraph 7.3.4 LC_NUMERIC.

LC_NUMERIC
# This is the POSIX locale definition for
# the LC_NUMERIC category.
#
decimal_point    "<period>"
thousands_sep    ""
grouping         -1
#
END LC_NUMERIC
crea7or commented 3 years ago

Alternativery it can be taken from JSON RFC7159

RFC 7159                          JSON                        March 2014

   An exponent part begins with the letter E in upper or lower case,
   which may be followed by a plus or minus sign.  The E and optional
   sign are followed by one or more digits.

   Numeric values that cannot be represented in the grammar below (such
   as Infinity and NaN) are not permitted.

      number = [ minus ] int [ frac ] [ exp ]
      decimal-point = %x2E       ; .
      digit1-9 = %x31-39         ; 1-9
      e = %x65 / %x45            ; e E
      exp = e [ minus / plus ] 1*DIGIT
      frac = decimal-point 1*DIGIT
      int = zero / ( digit1-9 *DIGIT )
      minus = %x2D               ; -
      plus = %x2B                ; +
      zero = %x30                ; 0
UlfBj commented 3 years ago

I think the JSON RFC looks like a good reference to use. Thanks.

UlfBj commented 3 years ago

@SebastianSchildt : Are you fine with the JSON ref?

gunnarx commented 3 years ago

I would prefer the ambiguity is instead resolved by leaving the "everything is a string" approach and use standard JSON RFC8259, in other words to encode numbers as numbers, without quotation marks. I believe that every standard JSON parser will understand the data as it is written when following the specification, and that parsers will most likely deliver the data as the "right type" (a number type if it is a number) back to the program that uses a JSON parsing library. In the end this seems to me to cause the most simplicity and expected behavior.

"The DB implementation is simplified by having a common storage format."

Yes, in a simple relational table only one value type is required. But it is also much less efficient to store numbers as strings.

In other cases such as when it is desired to gather number data as numbers, because the database is able to manipulate and understand numbers[1], then it seems more appropriate to have numbers as soon as possible instead of having to convert. So it will be simpler in some cases, and more difficult in some. For that reason I feel like the most straight-forward way is to go fully according to standards (JSON document) and to represent data in the most obvious way for the type of the data.

[1] https://docs.influxdata.com/influxdb/v1.8/query_language/functions/

crea7or commented 3 years ago

Although there is not so much performance penalty because it does not matter who will parse the text representation of the number, the JSON library, or server code - just a good JSON library is required. The right use of JSON with its own type system looks more adequate.

peterMelco commented 3 years ago

So this is an example of the south-bound interface towards a vehicle signal broker. It is written in Go, the actual type returned here is checked and then converted to the VISS v2 string, note that the broker in this case does not have values of type string. It is of course practical for the server implementation from here on to treat all values as of type string. But...

value := Signals.signalvalues[vsssignalname].(*base.Signal)
    pl := value.GetPayload()
    switch pl.(type) {
        case *base.Signal_Integer:{
            val := pl.(*base.Signal_Integer).Integer
            return strconv.FormatInt(val,10)
        }
        case *base.Signal_Double:{
            val := pl.(*base.Signal_Double).Double
            return strconv.FormatFloat(val,'f',-1,64)
        }
        case *base.Signal_Arbitration:{
            val := pl.(*base.Signal_Arbitration).Arbitration
            return strconv.FormatBool(val)
        }
        default:{
            return string(value.Raw)
        }
    }

In this case the data structure that holds the values uses the native signal type and converts to string on access. So changing the south-bound interface to adhere to a number format should be straightforward.

UlfBj commented 2 years ago

Text format for all data types defined by PR421