Closed peterdesmet closed 7 years ago
Good comment.
Other relevant examples to take into account are:
length:
numberformat: .3 # interpreted as float 0.3 by pyyaml
code_id:
regex: [D-G] # should be `[D-G]`, otherwise interpreted as list by pyyaml
tl;dr: applications such as pyyaml will implicitly extract the data type of the input, resulting in float, int, list,.. data types in the interpreted yaml file...
We should check this with respect to the general YAML specifications. I'll checked the docs as try to bring some short summary here. Central are the nodes
A YAML node represents a single native data structure. Such nodes have content of one of three kinds: scalar, sequence, or mapping.
When checking the information on [native data structures
](http://yaml.org/spec/1.2/spec.html#native data structure//):
YAML represents any native data structure using three node kinds: sequence - an ordered series of entries; mapping - an unordered association of unique keys to values; and scalar - any datum with opaque structure presentable as a series of Unicode characters. Combined, these primitives generate directed graph structures. These primitives were chosen because they are both powerful and familiar: the sequence corresponds to a Perl array and a Python list, the mapping corresponds to a Perl hash table and a Python dictionary. The scalar represents strings, integers, dates, and other atomic data types.
Hence, these scalar
representations are essential, as these define our data specifications itself (e.g. see metadata
in the example). The interpretation of the data types can be handled by tags
.
YAML represents type information of native data structures with a simple identifier, called a tag
However, as we probably not want to let users provide the appropriate tags in each field themselve, this is subjected to implicit tag definition:
In YAML, untagged nodes are given a type depending on the application.
For example, when checking the documentation of the pyyaml
package:
Plain scalars without explicitly defined tags are subject to implicit tag resolution. The scalar value is checked against a set of regular expressions and if one of them matches, the corresponding tag is assigned to the scalar. PyYAML allows an application to add custom implicit tag resolvers.
More information on the tag resolution according to the Core Schema
is provided here. This implicit tag resolution is according to the examples above.
We could actually start to adapt the YAML-file resolution within the pyyaml application (e.g. the function add_implicit_resolver
can be used to provide custom resolvers). As such, we maybe can resolve all specifications (sections after :
) as (unicode) string.
Notice:
min
, max
, minlength
, maxlength
need to be interpreted as integer or floats, so implicit resolving is required. Hence, ignoring all resolving and stick to string wouldn't work.[]
within the context of regex versus as container (i.e. list) is only generalized by the context of the regex
expression itself. I'm not completely sure, but I do think this is not according to the YAML-specs (Construction must be based only on the information available in the representation, and not on ..., scalar content format, ...
)I would rather keep as close to general YAML handling (i.e. pyyaml load
function results in object that can be used as such) and document the necessity of the quotes for certain specifications in the whip specifications. To my knowledge (and currently tested), quotes are required for the following specifications: regex
, dateformat
and numberformat
.
Agree to document the necessity of the quotes for certain specifications in the whip specifications. That is indeed the case (and documented) for: regex
, dateformat
and numberformat
.
We should probably provide a recommendation when values should be quoted: