planetarypy / pvl

Python implementation of PVL (Parameter Value Language)
BSD 3-Clause "New" or "Revised" License
19 stars 19 forks source link

We probably need a variant PDS3LabelEncoder(). #84

Closed rbeyer closed 3 years ago

rbeyer commented 3 years ago

The various PVL "flavors" sometimes have asymmetric rules for "reading" and "writing." They are generally more permissive on "read" and strict on "write."

For example, for PDS3 Labels:

PDS3 Reads pvl converts to Python PDS3 Writes
2000-02-25T10:54:12.129Z datetime 2000-02-25T10:54:12.129Z
2000-02-25T10:54:12.129 datetime 2000-02-25T10:54:12.129Z
'I am a Symbol String' str 'I am a Symbol String'
"I'm a Text String" str "I'm a Text String"
"I am double-quoted" str 'I am double-quoted'

See the asymmetry?

PDS3 Label "readers" can read times without any kind of timezone specifier, or with the "Z" specifier, but PDS3 Label "writers" must write out the trailing "Z" (see #82 for too much information).

Similarly, did you know that ODL/PDS3 considers single-quoted strings as 'Symbol Strings' with restrictions on what can be inside them, and double-quoted "Text Strings" which can contain anything. So on "write" our pvl.encoder.PDS3LabelEncoder sees if a Python str qualifies as a 'Symbol String' and writes it out with single quotes, and as a double-quoted "Text String" otherwise.

If a user wanted to get some PVL-text that could be read by a PDS3 "reader" but wanted no trailing "Z" or some text that qualified as a Symbol String to be double-quoted, we don't have a good mechansim for that, beyond them extending our PDS3LabelEncoder().

Describe the solution you'd like I think we should keep the pvl.encoder.PDS3LabelEncoder() as it is, meaning that it is very strict. However, we should create a pvl.encoder.PDS3VariantEncoder() that allows users to specify which of the "optional" formatting they'd like to allow, but still write out PVL-text that is "readable" by our pvl.decoder.PDS3Decoder().

Describe alternatives you've considered I thought about simply putting optional parameters in the existing pvl.encoder.PDS3LabelEncoder() that would modify the behavior, however, while users may have an option, the ODL/PDS3 Spec on writing isn't really optional. So I'd rather keep the pvl.encoder.PDS3LabelEncoder() very strict and immutable, but still provide this other encoder.

This also made me think more about allowing more "smart" types that we decode PVL-text to that "remember" the PVL-text that they came from (as Michael suggested in #81). If the library could determine whether there was or wasn't a "Z" on the text that became a datetime then it could write it back out the same way. Similarly with the different string quoting patterns. That's all still valid, but this question of what to "allow" on output would still remain.

Additional context The context here is that there are many bits of software out there that "read" PVL-text, and they do not all fully implement the PDS3 "reader" spec. So someone could properly use the PDS3 spec to create a time with a trailing "Z", which would validate and be archived with the PDS, but software that "thinks" it works on PDS3 PVL-text might barf on the "Z" because it isn't properly implementing the "reading" of PDS3 labels.

However, if someone didn't properly use the PDS3 spec, and wrote times without the trailing "Z", those labels would still validate and be archived, because the validators are "readers" that can read both (stupid asymmetric standard). In practice, this has happened quite a bit, and if a developer was making some software by just looking at a corpus of PVL-text (and not closely reading the ODL/PDS3 spec), that software may not handle the "Z".

As always, there's the spec, and then there's how people use things in practice. Having this variant encoder would help us support a wider variety of the ways that people are practically using PDS3 PVL-text, while still being "readable" by official PDS3 "readers."

rbeyer commented 3 years ago

I changed my mind about implementing a separate encoder as I worked with this. We already had one "optional" parameter (tab_replace which would let you replace tabs with spaces, instead of raising an exception since tabs aren't allowed in PDS3 PVL-text, but would allow you to leave them in, if you really wanted).

So adding these as options to pvl.encoder.PDSLabelEncoder() with reasonable defaults was the way I made it.