planetarypy / pvl

Python implementation of PVL (Parameter Value Language)
BSD 3-Clause "New" or "Revised" License
19 stars 19 forks source link

ISIS PVL - Operator Error or Improvement to the OmniParser? #108

Open jlaura opened 4 months ago

jlaura commented 4 months ago

Describe the bug The following PVL fails to parse, presumably because of the keyword 'Group' inside a canonical Group.

To Reproduce

import pvl
pvl.loads("""
Group = Keyword1
  Auto
  InputKey       = CSMPlatformId
  InputGroup     = "IsisCube,CsmInfo"
  InputPosition  = (IsisCube, CsmInfo)
  OutputName     = Keyword1
  OutputPosition = (Group, SerialNumberKeywords)
  Translation    = (* , *)
End_Group

Group = Keyword2
  Auto
  InputKey       = CSMInstrumentId
  InputGroup     = "IsisCube,CsmInfo"
  InputPosition  = (IsisCube, CsmInfo)
  OutputName     = Keyword2
  OutputPosition = (Group, SerialNumberKeywords)
  Translation    = (*, *)
End_Group

Group = Keyword3
  Auto
  InputKey       = ReferenceTime
  InputGroup     = "IsisCube,CsmInfo"
  InputPosition  = (IsisCube, CsmInfo)
  OutputName     = Keyword3
  OutputPosition = (Group, SerialNumberKeywords)
  Translation    = (*, *)
End_Group
End
""")

Perhaps I should be parameterizing differently?

Expected behavior Instantiation of a PVL object.

Error logs or terminal captures If applicable, add examples to help explain your problem.

Your Environment (please complete the following information):

Additional context Add any other context about the problem here.

rbeyer commented 4 months ago

Admittedly, the error messages you are getting are not helping you, and that's my fault. The above PVL text contains two problems (but without debugging the code for an hour like I did, the error messages are useless):

The first problem is that you are using a reserved word.

When trying to run the above, you will have gotten an error that looks like this:

pvl.exceptions.LexerError: (LexerError(...), 'While parsing, expected a comma (,)but found: "Group": line 7 column 21 (char 186) near "= (Group,"')

However, what this error message really needs to tell you is that you have used one of PVL's seven reserved words (in this case, "Group") as an unquoted string in your PVL text:

OutputPosition = (Group, SerialNumberKeywords)

If you change this to:

OutputPosition = ("Group", SerialNumberKeywords)

and in all of the other cases where you use this, then this error goes away.

The second error is that you just have a token hanging out that doesn't conform to the PVL spec.

Once you quote all of the instances of "Group" and then try and load this again, you will get this error:

pvl.exceptions.LexerError: (LexerError(...), 'Expecting an Aggregation Block, an Assignment Statement, or an End Statement, but found "End_Group" : line 9 column 1 (char 248) near "   = (* ,"')

Frankly, this error is hard to understand. I had to read and re-read the code to understand exactly what the problem was here and why it was being emitted in this way--and I wrote the error message! What's happening, is that the parser is trying to link parameters with values inside the first group, and it acquired this random "Auto" token that it doesn't know what to do with, and then when the parser encounters the "End_Group" token to close the group, it freaks out about having the "Auto" token which isn't a parameter or a value, and tries to emit what it thinks is a helpful error message, but just ... isn't.

Essentially that bare "Auto" hanging out violates the PVL spec, because it isn't a parameter, or a value, or an aggregate statement. It doesn't fit into the PVL syntax, but the error message there is not helping you understand that.

If you quote all of the "Group" instances and remove the lines with "Auto" or comment them out, you will get this:

PVLModule([
  ('Keyword1',
   {'InputGroup': 'IsisCube,CsmInfo',
    'InputKey': 'CSMPlatformId',
    'InputPosition': ['IsisCube', 'CsmInfo'],
    'OutputName': 'Keyword1',
    'OutputPosition': ['Group', 'SerialNumberKeywords'],
    'Translation': ['*', '*']})
  ('Keyword2',
   {'InputGroup': 'IsisCube,CsmInfo',
    'InputKey': 'CSMInstrumentId',
    'InputPosition': ['IsisCube', 'CsmInfo'],
    'OutputName': 'Keyword2',
    'OutputPosition': ['Group', 'SerialNumberKeywords'],
    'Translation': ['*', '*']})
  ('Keyword3',
   {'InputGroup': 'IsisCube,CsmInfo',
    'InputKey': 'ReferenceTime',
    'InputPosition': ['IsisCube', 'CsmInfo'],
    'OutputName': 'Keyword3',
    'OutputPosition': ['Group', 'SerialNumberKeywords'],
    'Translation': ['*', '*']})
])

FYI, the seven reserved words are: BEGIN_GROUP, BEGIN_OBJECT, END, END_GROUP, END_OBJECT, GROUP, OBJECT (and since the pvl library supports mixed case reading of these tokens, then all capitalization variants must also be treated as reserved words).

In summary, this is bad PVL-text, but the pvl library is not giving the user sufficient information to understand why the PVL-text is bad. I will leave this Issue open while I work to improve the error reporting for these cases.

michaelaye commented 4 months ago

Excellent reply. It shows that you really care about the library and these tricky issues.

jlaura commented 4 months ago

@rbeyer Huge thank you. I very much appreciate the detailed response as it is helping me understand why the library is erroring out. Much appreciate it. I will open an issue on the ISIS repo to understand why this translation table is written the way that it is.

acpaquette commented 4 months ago

@rbeyer I looked over the PVL spec w.r.t. the reserved words and it seems like they can be used as values just not as parameter names.

"Reserved keywords are not permitted as Parameter Names within an Assignment Statement or as Block Names in Aggregation Statements"

From this it seems like the use of the reserved words as values is fine, just not as parameter names. Given I didn't dig much I could very well be mistaken.

With regards to the Auto parameter, I didn't find anything about parameter names with no values as valid PVL. So that may be an element in the trn files that makes them invaild PVL. I am not sure if we plan to change this, the Auto keyword is somewhat specific to trn files and acts as a boolean flag.

An easy fix may be to just change it to Auto=0|1, this would make the files complaint going forward and would be versioned appropriately with ISIS since all of the trns have been migrated to be part of the library

rbeyer commented 4 months ago

@acpaquette, I think that is a correct reading for the "reserved keywords" in the PVL blue book document. I think I had just adopted a strict interpretation of "reserved" in the parsing. Of course, if you always quote strings that are used as values in PVL text, then this is irrelevant. However, I agree that upon re-reading this portion of the document that this is a bug, and will work to address it.

Bare tokens, however, I am pretty sure aren't allowed. To understand this, you will have to read all of the syntax diagrams in the blue book, which is what the figure references that follow are. In Figure 2-1, we see that a block of PVL-text can only contain Assignment Statements (which must assign a value to a parameter) and Aggregation Blocks. In turn, Figure 2-15 shows that an Aggregation Block can only contain Assignment Statements or additional Aggregation Blocks. Thus I do not see anywhere in the specification that a bare token that is not a reserved keyword can exist in PVL-text.

rbeyer commented 4 months ago

I did a little tinkering, and while in the post above I thought that simply changing this behavior for reserved words would be straightforward, but I have discovered some complications. There's now a conversation to be had about how to treat reserved words that the PVL spec does not necessarily help us with (have I mentioned how I dislike PVL?).

If you make the change to allow reserved words to be interpreted as unquoted strings, then this construction, which the pvl library has correctly parsed for years, would now have a different result.

This PVL-text:

foo = 1;
Object = embedded_object;
    foo = bar
    life =
End_Object;
End;

Currently results in this Python object:

PVLModule([
  ('foo', 1)
  ('embedded_object',
   {'foo': 'bar',
    'life': EmptyValueAtLine(4 does not have a value. Treat as an empty string)})
])

If we allowed reserved words to act as unquoted strings, the above PVL-text would result in this Python object:

PVLModule([
  ('foo', 1)
])

This may seem strange, but since the "life" parameter gets the value of "End_Object" there is no actual "End_Object" keyword to close the "embedded_object" object, and so the pvl Omniparser assumes there is an error with parsing that object, and ignores it, then finds an "End" which terminates the PVL-text.

So, with this in mind, I don't think that it is such a slam-dunk to make this change.

acpaquette commented 1 month ago

@rbeyer Appreciate the update!

acpaquette commented 1 month ago

@rbeyer After reading over the PVL spec I think you are actually correct that ISIS should not have the reserved keywords as identifiers like this quoted or otherwise. From the spec:

A few identifiers have special significance in ODL statements and are therefore reserved. They cannot be used for any other purpose (specifically, they may not be used to name objects or attributes):

end, end_group, end_object, group, object, begin_object

I know it says specifically not as names but it's rather ambiguous if they are allowed as ISIS uses them. There is also mention that identifiers/symbols can either be quoted or not and it is recommended to quote symbols/identifiers when possible.

As an aside is this actually valid ODL/PVL?

foo = 1;
Object = embedded_object;
    foo = bar
    life =
End_Object;
End;

From reading over the standard assignments to strings should have a starting and ending ". Assignment to quoted symbols should have a starting and ending '. And assignment to an identifier should have at least one letter defined as such letter[letter | digit | _letter | _digit]*.

Very much appreciate working through this!

rbeyer commented 1 month ago

@acpaquette, So if you quote the reserved keywords like: parameter = "end_group" that's fine, because the quotes protect them, and that's perfectly allowed, its just that I don't think they can be unquoted as values, because that triggers their interpretation as reserved words.

About the PVL text example I gave and you asked about, technically no, the PVL, ODL, and PDS3 specs would "formally" break (and the pvl library's pvl_validate tool shows that), but the Python pvl library has a variety of "easements" in the "Omni-parser" (which is the default) to allow certain kinds of broken PVL to still be interpreted, and this is one of those situations, the Python pvl library basically assigns an empty string to the life parameter there, because if it does, that allows it to properly parse the rest of the PVL-text.