planetarypy / pvl

Python implementation of PVL (Parameter Value Language)
BSD 3-Clause "New" or "Revised" License
19 stars 19 forks source link

Moving towards pvl 1.0 #37

Closed rbeyer closed 4 years ago

rbeyer commented 4 years ago

First time Issue contributor here, but a long time user.

I would like to see us polish up the pvl package, and get it conformant to the new guidelines that the PlanetaryPy Project is setting for its Affiliated Packages (now that it is starting to get more formalized, although those haven't quite set yet).

Primarily, this means sloughing off the need to serve Python < 3.6, and to minimize external dependencies and make them 'optional.' This would also spark a new round of development, and hopefully address the lingering Issues and PRs on the repo, and would (clearly) necessitate a new major revision number, and it is high time that this library get a 1.0 version.

My question is how to go about this. I've already taken a pretty big bite into this problem, and I want to share my progress. I don't think its ready to merge into the master trunk, but I'd like to have a new branch here where we can work (the alternative is for me to fork a copy, but post issues about it in this repo, to keep everyone knowledgeable).

What's the process for a new contributor like me to be able to create (or just push to) a new branch under this repo? Or would you prefer that I work on a personal fork?

Let me know.

wtolson commented 4 years ago

Hey @rbeyer. Thanks for tackling this. My thoughts on this are to remove support for < 3.6 as a dependency moving forward rather then rewriting the code now to serve this purpose. As we address the other outstanding issues we can update as necessary but I think actively rewriting just to remove support for older versions will just introduce bugs into code that works now.

The exception to this is we should be updating the documentation, setup.py, tests runner configuration, etc. to state we no longer support the older versions.

Here are the general guidelines for contributing: https://github.com/planetarypy/pvl/blob/master/CONTRIBUTING.rst#get-started

rbeyer commented 4 years ago

Thanks for being so encouraging.

Let me outline what I've been up to, because I suspect it is more than you think.

First off, I have performed what I call a 'light refactor' to perform a 'minimal' upgrade of pvl, and you can inspect it in this fork branch: https://github.com/rbeyer/pvl/tree/ppy36

The HISTORY.rst describes the work. All of the existing tests pass (except for the one that also currently fails in master).

After I did that, the next phase that seemed needed was to work natively with the Python str type instead of the bytes type that was more common in Python 2 (and needed to be able to support Python 2). As I examined how to go about this, I discovered that the entire decode/encode structure in pvl was not only working with bytes internally, it was also using a custom stream implementation. Given the limitations of Python 2, that was clearly the way to go, but we can do better with Python 3.

Primarily, by switching from the concept of dealing with bytes internally, to working with str internally, we make a lot of operations cleaner and can enable other functionality. We do this by converting from bytes to str on the border (either in ingest or output, as requested, and then only dealing with str on the inside).

So this led me down the path of reimplementing the 'decode' and 'encode' parts of pvl. My plan was to ensure that all of the currently passing tests would still pass, as well as implementing a more complete unit test framework. And also see how I could address the issues in the two current PRs on the main repo (#20 and #34), and as many other Issues as possible.

The other thing that I discovered is that there is more than one 'flavor' of PVL. As far as I can tell, there are four: (1) the official PVL specification, (2) ODL, a subset of the PVL spec, (3) PDS-compliant ODL (a few extra tweaks on ODL that PDS3 conforms to), and (4) what gets written out by ISIS in its labels (which does not conform to any of the above specifications, and has no guiding documentation).

The current pvl library does a fantastic job of 'consuming' all of these flavors of PVL, and even though there are several flavors of 'encoder' (PVLencoder, PDSLabelEncoder, ISISCubeLabelEncoder), none of them actually enforce the specifications. So you can hand the current PDSLabelEncoder a dict-like object, and it will dump some text that does not conform to the PDS Label specification.

What I wanted to do was to enable pvl to be more rigorous about the various specifications. On the decode side, it should still be able to consume all PVL-like text as a default (but should also have more restrictive decoders that will only decode their 'flavor' and will error otherwise). On the encode side, the default flavor dumped out should probably be the PDS Label version (which is the most restrictive), but users should also be able to write out PVL-compliant and ODL-compliant text, if they wish, and really enforce the rules on output for each of those flavors.

This would allow users to 'validate' any PVL that they have, and enable conversion from one flavor to another.

So I've been working on that implementation in this forked branch: https://github.com/rbeyer/pvl/tree/refactor

All of the 'old' tests (that I still want to have pass) are now in tests/test_pvl.py.

It is not complete, by any means, the commit messages are not tidy and represent work-in-progress, but you can at least peek at what I'm thinking. Of course, I've been working in isolation, and I'd really appreciate a conversation about any aspects of how things are dealt with. When the dust settles and I can remove the scaffolding, I plan to submit a PR on this for pvl 1.0.

rbeyer commented 4 years ago

38 has now been merged, so this is now addressed.