Open sosna opened 1 year ago
Indeed, if a dimension is non-coded without any restrictions, then any separation character could clash with arbitrary characters used in dimension values. A suitable solution seems being to escape the related dimension values using the classical CSV escaping mechanism, which is encapsulating problematic strings into 2 surrounding double-quotes "
, e.g., M.ABC."A string with . (dot) and "" (double-quotes)".2020-01
Note that double-quotes themselves are escaped in CSV by doubling them.
Note also that for consistency, the key specification of SDMX-CSV aligned to the key specification of the SDMX rest syntax in https://github.com/sdmx-twg/sdmx-rest/blob/master/doc/data.md. What would be the solution for this issue in the SDMX rest syntax?
Thanks a lot, @dosse! Maybe this could be added as an example to the SDMX-CSV guide? For REST, I guess we would need to investigate this separately, as we have the additional restriction of the characters that may be used as path or query parameters.
@sosna Thanks! Maybe the CSV solution should then wait to see if things can be aligned between REST and CSV?
Interesting discussion, @sosna and @dosse. What about using the encoding as in URLs?
It's not very user friendly, but if it only used as a way of escaping problematic characters it might be probably reduced to the dot (.) as %2E, the double quotes (") as %22, the comma (,) as %2C and very few more, so they will soon become well-known codes.
@dosse: Thanks. Yes, sure, we can park it for the time being, if this is your preference.
@egreising: Thanks. I think this would solve the problem for SDMX-CSV indeed, but maybe not for SDMX-REST, as Jens pointed out? I think browsers will typically send query strings and path parameters as percent-encoded values? If this is so, then, I guess it would not help, i.e. how could we distinguish between a %2E that is used as key separator and a %2E that is used as normal character in an uncoded dimension value?
The field guide states the following about keys (the highlight is mine):
However, the guide does not specify what to do in case a dimension value contains a dot, i.e. the character to be used as separator. This cannot be the case in case of coded dimensions (as the
.
is not an allowed character for an SDMX code), but could be the case if:Valuelist
, instead of aCodelist
.So, in case a dimension value contains a dot, what should the service provider do, when building the series and/or observation keys if SDMX-CSV data messages?
Thank you.