planetarypy / pvl

Python implementation of PVL (Parameter Value Language)
BSD 3-Clause "New" or "Revised" License
19 stars 19 forks source link

PVL Read ISIS Control network #98

Closed jlaura closed 2 years ago

jlaura commented 2 years ago

Describe the bug Unable to read an ISIS control network after upgrading to > 1.0. In the past, I have been able to pass a full ISIS control network to pvl and have it read the header off the network. The protobuf internals are outside the scope of the pvl library.

To Reproduce Using the attached control network run a pvl.load(). The attached network is zipped so that GitHub will accept it as a filetype.

test.net.zip

Expected behavior A clear and concise description of what you expected to happen.

Error logs or terminal captures

Traceback (most recent call last):
  File "/Users/jlaura/miniconda3/envs/plio/lib/python3.8/site-packages/pvl/lexer.py", line 429, in lexer
    t = yield t
ValueError: Expecting an Aggregation Block, an Assignment Statement, or an End Statement, but found "14:37:20" 

Your Environment (please complete the following information):

Additional context I am not 100% sure that this is a PVL issue or an issue with the mocked control network. The Aggregation Block and lexor errors are opaque to me though, so I am looking for some additional guidance.

rbeyer commented 2 years ago

This is a fun one.

So this happens:

$> pvl_validate -v test.net
ERROR: PDS3 load error test.net (LexerError(...), 'Expecting an Aggregation Block, an Assignment Statement, or an End Statement, but found "14:37:20" : line 21 column 25 (char 65568) near ""')
ERROR: ODL load error test.net (LexerError(...), 'Expecting an Aggregation Block, an Assignment Statement, or an End Statement, but found "14:37:20" : line 21 column 25 (char 65568) near ""')
ERROR: Omni load error test.net (LexerError(...), 'Expecting an Aggregation Block, an Assignment Statement, or an End Statement, but found "14:37:20" : line 21 column 25 (char 65568) near ""')
pvl library version: 1.3.0
PDS3 | does NOT load |
ODL  | does NOT load |
PVL  |     Loads     |     Encodes
ISIS |     Loads     |     Encodes
Omni | does NOT load |

The fact that the PVL and ISIS dialects will load it, but the Omni dialect doesn't is ... disturbing. Turns out I'm pretty sure it is the PVL-text's fault and not the pvl module's fault, but in the short term, you can use the ISIS or PVL grammars like this to get going (but this is really just an ugly hack):

foo = pvl.load("test.net", grammar=pvl.grammar.ISISGrammar())

Now, to the solution:

I'm not sure why your exception isn't showing the line number like the ones I'm getting from my pvl_validate, but the label appears to end on line 20 (if you use head -20 and head -21 you'll see what I'm talking about). The error is because the lexer is trying to make sense of text beyond the "END" which isn't PVL-text, hence the failure.

Why is it trying to do that? Character sets!

What do the PVL and ISIS pvl.grammars have in common that the ODL, PDS3, and Omni grammars don't? They accept different character sets. PVL (and pvl.grammar.ISISGrammar which inherits from pvl.grammar.PVLGrammar) accepts most (this "most" is going to be very relevant in a moment) of the ISO 8859-1 'latin-1' character set. ODL (and pvl.grammar.PDSGrammar which inherits from pvl.grammar.ODLGrammar) only accepts ASCII characters (but all of them), and pvl.grammar.OmniGrammar accepts any character.

Why does this matter for your PVL-text? Well, it turns out that the character right after the "D" in "END" which would appear to finish the PVL-text in the header of the file is a NULL character (ASCII 0, Hex 0x0) which is part of ASCII, but one of the characters that the PVL spec expressly forbids (but the ODL/PDS3 can handle it, and its is certainly a valid UTF character for the OmniGrammar). So the PVL/ISIS grammars come across this character and say "whoa, I can't deal with this character, so I'm going to consider this point the end-of-file, does what I have so far make sense as PVL?" and since the last recognizable character it grabbed was the "D" in "END," the preceding is all good, it happily returns a dict-like. When the ODL/PDS3/Omni grammars come along, the last recognizable token that they could swallow was "End_Object = ProtoBuffer" and the next one they tried to parse starts with "END" but is followed by a bunch of completely parseable characters but then there is a double-quotation character which can't be in a PVL Aggregation Block, Assignment Statement, or End Statement and so throws the error.

This changed with pvl 1.3.0, prior to that the OmniGrammar had the same restrictive character set as original PVL itself, and would have succeeded on this PVL-text.

What's the right long-term solution? I think that the change to the OmniGrammar approach was correct, and should stay. I think that the pvl module is correctly handling things. I think that whatever is writing these control networks needs to stop writing broken PVL-text. Here's the problem, just the characters "END" aren't enough. What if there was PVL-text like this:

STARTED = "Start time"
ENDED = "Bogus example"
END

If the only thing the parser did was look for that three-character sequence, the returned dict-like wouldn't have the "ENDED" key-value pair. The next thing after the "END" is important. According to the original PVL-spec It can be one of these four things:

  1. whitespace (which is defined in the PVL-spec and is currently identical for all of the grammars, and is any of: " ", "\t", "\n", "\r", "\v", "\f")
  2. a valid PVL comment
  3. a semi-colon (;)
  4. or the end of the "octet stream" of bytes.

The problem is that whatever is writing that file just starts adding NULL characters after the "D" in "END" and since that character happens to be on the PVLGrammar's no-parse list, condition 4 gets triggered, but kind of by accident. The better thing to do is put a semi-colon after the END or inject some proper whitespace.

jlaura commented 2 years ago

@rbeyer This is great information both to get this working and to get an issue opened on the ISIS repo in order to consider a change over there to how the PVL header to an ISIS control network is being written.

rbeyer commented 2 years ago

Closing this Issue, but glad to see other Issues linked to it.

rbeyer commented 2 years ago

I've been thinking more about this.

There is a solution that would allow this PVL-text to be loaded by the Omni dialect (the default strategy used to parse PVL-text when the user doesn't specify). That solution would be to disallow recognition of ASCII NULL characters ("\0") in PVL-text that the Omni strategy reads.

That is fundamentally what happens in the PVL and ISIS dialects, and why they will load this PVL-text. The pvl 1.3 change was to remove all character forbiddence and take any UTF character and parse until the PVL-text ended according to the various PVL rules. As I stated above, my thinking was that this approach would be as broad as possible, but maybe that is indeed the wrong perspective?

Changing this approach for the default Omni strategy means that ASCII NULL characters could not be a part of PVL-text parameters, values, or comments, because as soon as an ASCII NULL were encountered, PVL-text parsing would terminate.

Is this a big loss? Probably not, especially since that is how this library largely functioned for a long time anyway, and who in their right mind would include an ASCII NULL character intentionally as part of PVL-text that is meant to be recovered? My suspicion is that ASCII NULLs are only injected mistakenly, perhaps with the idea that they were "terminating" PVL-text parsing. Making this change would still allow the wide rainbow of UTF characters to be parsed by the default strategy and only exclude the ASCII NULL, which 99% of the time (like in this example) is meant to (incorrectly) signal the end of PVL-text parsing anyway.

I imagine this would make Jay happy (but the ISIS code should shape up its PVL-text writing anyway), but can anyone think of a reason why we shouldn't exclude ASCII NULLs?

michaelaye commented 2 years ago

LGTM

jlaura commented 2 years ago

I agree 100% that ISIS should clean up it's PVL and be compliant to a written, and widely adopted and supported, standard!