planetarypy / pvl

Python implementation of PVL (Parameter Value Language)
BSD 3-Clause "New" or "Revised" License
19 stars 19 forks source link

Comments #87

Open rbeyer opened 3 years ago

rbeyer commented 3 years ago

Is your feature request related to a problem? Please describe. The pvl library quietly ignores comments when it reads PVL-text, and has no mechanism for programmatically adding comments to a Python structure that could be given to the dumpers (pvl.dump() and pvl/dumps()).

There was some discussion of this in #35, but that was before the architecture change to pvl 1.0.0 and there was never a PR.

Thoughts The decoders currently have a "hook" for recognizing comments (so they can be skipped over), but this could be used to capture comments. The PVL Blue Book indicates that comments can appear ... anywhere. They can even be between things that you might not expect:

distance /* comments */ = /* can */ 1 /* be */ <AU> /* anywhere */

When I first understood this, it made my brain hurt. The above PVL-text would be parsed into a single key-vale pair (with units), but had four comments associated with it. To properly keep track of their location, you'd have to do a tremendous amount of book-keeping.

Fortunately, the ODL/PDS3 specification doesn't allow this sort of nonsense, and requires that comments be on a line by themselves, or must come at the end of a line, like so:

/* This is a comment 
that is all by itself */
distance = 1 <AU> /* This comment is at the end of a line */

Since there is almost no usage of straight PVL, and the only practical usage is ODL/PDS3 and variants thereof, I think that a comment system that robustly handled the ODL/PDS3 comment situation, but not the PVL situation would be fine. I think the right behavior would be to recognize the interspersed comments in the PVL example, but to parse them each into "separate comments" in the data structure that were associated with the parameter-value pairing in the right order but not try to preserve their location amongst the elements (it could be done, but I don't think there's a lot of practical use).

To realize this in code requires a new container object that would be returned by the loaders and accepted by the dumpers. We would either need to modify our pvl.collections.OrderedMultiDict or the experimental pvl.collections.PVLMultiDict (based on multidict.MultiDict) to be able to handle keeping track of comment objects and associating them with the right position in the dict-like. Users would need access to them, and they would need to behave correctly if you reordered things, or needed to build a new dict-like from scratch and add your own comments to it, etc.

This seems hard, and I'm loathe to try and do this from scratch. Fortunately, we may be able to collapse this to a previously-solved problem. The TOML language allows comments in similar relation to key-value pairs that ODL/PDS3 does. The toml Python package has the same behavior as pvl, it just skips over comments and doesn't try to parse them (hey, I said it was hard). However, the tomlkit Python package does, and has a container.Container object that we might just be able to use without alteration. It is a dict-like that should function similar to our OrderedMultiDict, but has "document" aspects like managing comments, and even allowing management of blank lines to help preserve readability, while still presenting dict-like functionality for the "data."

I don't have a strong need for this, so I'm unlikely to work this any time soon, but I just wanted to put this down as a seed for future development.

michaelaye commented 3 years ago

Sounds like a good choice to restrict yourself to PDS3 the most frequent use case. Perfection in Generality is very time consuming. Thanks for the tomlkit reference, I was just looking for a structure conserving reader, so that I can have the URL management the way I find it most readable in plpcore and not losing the structure during updates.

-- Michael Aye LASP / CU Boulder

On Tue, Mar 23 2021 at 17:44, Ross Beyer < @.*** > wrote:

Is your feature request related to a problem? Please describe. The pvl library quietly ignores comments when it reads PVL-text, and has no mechanism for programmatically adding comments to a Python structure that could be given to the dumpers ( pvl.dump() and pvl/dumps() ).

There was some discussion of this in #35 ( https://github.com/planetarypy/pvl/issues/35 ) , but that was before the architecture change to pvl 1.0.0 and there was never a PR.

Thoughts The decoders currently have a "hook" for recognizing comments (so they can be skipped over), but this could be used to capture comments. The PVL Blue Book indicates that comments can appear ... anywhere. They can even be between things that you might not expect:

distance / comments / = / can / 1 / be / / anywhere /

When I first understood this, it made my brain hurt. The above PVL-text would be parsed into a single key-vale pair (with units), but had four comments associated with it. To properly keep track of their location, you'd have to do a tremendous amount of book-keeping.

Fortunately, the ODL/PDS3 specification doesn't allow this sort of nonsense, and requires that comments be on a line by themselves, or must come at the end of a line, like so:

/ This is a comment that is all by itself / distance = 1 / This comment is at the end of a line /

Since there is almost no usage of straight PVL, and the only practical usage is ODL/PDS3 and variants thereof, I think that a comment system that robustly handled the ODL/PDS3 comment situation, but not the PVL situation would be fine. I think the right behavior would be to recognize the interspersed comments in the PVL example, but to parse them each into "separate comments" in the data structure that were associated with the parameter-value pairing in the right order but not try to preserve their location amongst the elements (it could be done, but I don't think there's a lot of practical use).

To realize this in code requires a new container object that would be returned by the loaders and accepted by the dumpers. We would either need to modify our pvl.collections.OrderedMultiDict or the experimental pvl.collections.PVLMultiDict (based on multidict.MultiDict ) to be able to handle keeping track of comment objects and associating them with the right position in the dict-like. Users would need access to them, and they would need to behave correctly if you reordered things, or needed to build a new dict-like from scratch and add your own comments to it, etc.

This seems hard, and I'm loathe to try and do this from scratch. Fortunately, we may be able to collapse this to a previously-solved problem. The TOML language allows comments in similar relation to key-value pairs that ODL/PDS3 does. The toml Python package ( https://github.com/uiri/toml ) has the same behavior as pvl , it just skips over comments and doesn't try to parse them (hey, I said it was hard). However, the tomlkit Python package ( https://github.com/sdispater/tomlkit ) does, and has a container.Container object that we might just be able to use without alteration. It is a dict-like that should function similar to our OrderedMultiDict, but has "document" aspects like managing comments, and even allowing management of blank lines to help preserve readability, while still presenting dict-like functionality for the "data."

I don't have a strong need for this, so I'm unlikely to work this any time soon, but I just wanted to put this down as a seed for future development.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub ( https://github.com/planetarypy/pvl/issues/87 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAARBDTRZDDH5WFBAVSI5NTTFERXBANCNFSM4ZWFQ5WQ ).

michaelaye commented 2 years ago

I use tomlkit now to keep the config file the same between updates. How about storing comments into another dict under label['comments']['original_key'] ?