Closed rbeyer closed 3 years ago
I changed my mind about implementing a separate encoder as I worked with this. We already had one "optional" parameter (tab_replace
which would let you replace tabs with spaces, instead of raising an exception since tabs aren't allowed in PDS3 PVL-text, but would allow you to leave them in, if you really wanted).
So adding these as options to pvl.encoder.PDSLabelEncoder()
with reasonable defaults was the way I made it.
The various PVL "flavors" sometimes have asymmetric rules for "reading" and "writing." They are generally more permissive on "read" and strict on "write."
For example, for PDS3 Labels:
pvl
converts to Pythondatetime
datetime
str
str
str
See the asymmetry?
PDS3 Label "readers" can read times without any kind of timezone specifier, or with the "Z" specifier, but PDS3 Label "writers" must write out the trailing "Z" (see #82 for too much information).
Similarly, did you know that ODL/PDS3 considers single-quoted strings as 'Symbol Strings' with restrictions on what can be inside them, and double-quoted "Text Strings" which can contain anything. So on "write" our
pvl.encoder.PDS3LabelEncoder
sees if a Pythonstr
qualifies as a 'Symbol String' and writes it out with single quotes, and as a double-quoted "Text String" otherwise.If a user wanted to get some PVL-text that could be read by a PDS3 "reader" but wanted no trailing "Z" or some text that qualified as a Symbol String to be double-quoted, we don't have a good mechansim for that, beyond them extending our
PDS3LabelEncoder()
.Describe the solution you'd like I think we should keep the
pvl.encoder.PDS3LabelEncoder()
as it is, meaning that it is very strict. However, we should create apvl.encoder.PDS3VariantEncoder()
that allows users to specify which of the "optional" formatting they'd like to allow, but still write out PVL-text that is "readable" by ourpvl.decoder.PDS3Decoder()
.Describe alternatives you've considered I thought about simply putting optional parameters in the existing
pvl.encoder.PDS3LabelEncoder()
that would modify the behavior, however, while users may have an option, the ODL/PDS3 Spec on writing isn't really optional. So I'd rather keep thepvl.encoder.PDS3LabelEncoder()
very strict and immutable, but still provide this other encoder.This also made me think more about allowing more "smart" types that we decode PVL-text to that "remember" the PVL-text that they came from (as Michael suggested in #81). If the library could determine whether there was or wasn't a "Z" on the text that became a
datetime
then it could write it back out the same way. Similarly with the different string quoting patterns. That's all still valid, but this question of what to "allow" on output would still remain.Additional context The context here is that there are many bits of software out there that "read" PVL-text, and they do not all fully implement the PDS3 "reader" spec. So someone could properly use the PDS3 spec to create a time with a trailing "Z", which would validate and be archived with the PDS, but software that "thinks" it works on PDS3 PVL-text might barf on the "Z" because it isn't properly implementing the "reading" of PDS3 labels.
However, if someone didn't properly use the PDS3 spec, and wrote times without the trailing "Z", those labels would still validate and be archived, because the validators are "readers" that can read both (stupid asymmetric standard). In practice, this has happened quite a bit, and if a developer was making some software by just looking at a corpus of PVL-text (and not closely reading the ODL/PDS3 spec), that software may not handle the "Z".
As always, there's the spec, and then there's how people use things in practice. Having this variant encoder would help us support a wider variety of the ways that people are practically using PDS3 PVL-text, while still being "readable" by official PDS3 "readers."