planetarypy / pvl

Python implementation of PVL (Parameter Value Language)
BSD 3-Clause "New" or "Revised" License
19 stars 19 forks source link

Proposal: Change _collections.Units to use astropy.units #22

Closed michaelaye closed 4 years ago

michaelaye commented 7 years ago

The astropy.units system is awesome and allows a lot of automated manipulations:

http://docs.astropy.org/en/stable/units/

michaelaye commented 7 years ago

If you want to stay minimalistic in dependencies, you could leave your namedtuple in as a base, check for existence of astropy in the sys.path and then use that for a boost of functionality.

michaelaye commented 6 years ago

Is there a reason why one should not consolidate on a community-used/supported unit system for unit support?

percurnicus commented 6 years ago

@michaelaye I love the astropy units but the entire astropy package seems a pretty big requirement for this repo. I also like pint, which I think would be a smaller solution than astropy.units (though I haven't looked into how large the package really is, I just know it is specified to units). However, I believe most people that would use pvl (and the greater PlanetaryPy project) would be more familiar with the astropy units. To answer your question, I can't think of one. I don't think we should be reinventing the wheel here and should choose between one or the other.

michaelaye commented 6 years ago

It doesn't have to be a hard dependency, right? As I suggested, leave the existing system, but offer a method inside it like to_astrounits() maybe? And if astropy is not installed, you let that method throw an exception, but if it is, you get back the equivalent astropy quantity.

percurnicus commented 6 years ago

What if we just subclassed the PVLDecoder to make astropy units the default? That way the state of the object is consistent through its existence. Then people can just pass it in to the loads function and get values with astropy units

percurnicus commented 6 years ago

The Units class can have a method to_astrounits:

try:
    from astropy import units as astrounits
    _ASTROPYINSTALLED = True
except ImportError:
    _ASTROPYINSTALLED = False

.
.
.

class Units(namedtuple('Units', ['value', 'units'])):

    def to_astrounits(self):
        if not _ASTROPYINSTALLED:
            raise ValueError('Or some other custom error')
        return self.value * astrounits.Unit(self.units)

Then PVLDecoder gets a get_units method to replace line 503:

def _get_units(self):
    return Units(value, self.parse_units(stream))

So then AstropyDecoder(?) can override that method:

def _get_units(self):
    return super(AstropyDecoder, self)._get_units().to_astrounits()

Then the UI would be:

>>> lbl = pvl.loads('''
    g = 9.8 <m/s>
    END
    ''',
    cls=AstropyDecoder,
    strict=False,
)
>>> lbl['g']
<Quantity 9.8 m / s>

Thoughts? I don't know when I'll have time to make a full PR (implementation, docs, tests) but would love it if you put one in

michaelaye commented 6 years ago

This looks good! It should work for 1-dim units, but sometimes Units returns a cartesian vector, so 3 numbers with one unit, for those we would need to point to the more advanced coordinate representations from here, I guess: http://docs.astropy.org/en/stable/coordinates/representations.html#astropy-coordinates-representations

percurnicus commented 6 years ago

hmm interesting! I've started a branch and generating a file of units that can show edge cases. So far this is the label I've come up with and it already shows some issues:

PDS_VERSION_ID    = PDS3
SOLAR_LATITUDE    = (0.25 <DEG>, 3.00 <DEG>)
EXPOSURE_DURATION = 10 <SECONDS>
DECLINATION       = -14.2756 <DEGREES>
MASS              = 123 <kg>
MASS_DENSITY      = 123 <g/cm**3>
MAP_RESOLUTION    = 123 <PIXEL/DEGREE>
MAP_SCALE         = 123 <KM/PIXEL>
END

First, astropy can't read DEG, but it's a simple fix: just lower DEG to deg and astropy is happy. Next astropy gets mad about SECONDS, it expects second. One solution is to use unrecognized units for those that don't parse cleanly but it seems like an unexpected behavior to not handle say seconds is unrecognized. I am trying to find a standard of all the units but this is the closest I've gotten. You might have more experience with this and know a good solution.

Do you have an example on how the Cartesian vector would be represented in a label? The closest I can get to is the SOLAR_LATITUDE in the example above but that would just a be a list of individual units.

michaelaye commented 6 years ago

The ISIS3 app campt returns these for body-fixed coordinates:

Units(value=[-894.61910412612, 180.08738713739, -3256.0876839485], units='km')

I convert them like so:

def vec_unit_to_astro(pvlunit):
    rep = CartesianRepresentation(pvlunit.value, unit=pvlunit.units)
    return rep
percurnicus commented 6 years ago

I would prefer the type of to_astrounits to be consistent. I'd rather have the default for say these paramter-values:

BodyFixedCoordinate = (-2015.9595225544, -1936.6155808127, 1917.2574858384) <km>
MASS                = 123 <kg>
SOLAR_LATITUDE      = (0.25 <DEG>, 3.00 <DEG>)

be

>>> lbl['BodyFixedCoordinate']
<Quantity [-2015.95952255, -1936.61558081,  1917.25748584] km>
>>> lbl['BodyFixedCoordinate'].value
array([-2015.95952255, -1936.61558081,  1917.25748584])
>>> lbl['MASS']
<Quantity 123. kg>
>>> lbl['SOLAR_LATITUDE']
[<Quantity 0.25 deg>, <Quantity 3. deg>]

Then the Cartesian representation can be done at a higher level pretty easily:

>>> CartesianRepresentation(lbl['BodyFixedCoordinate'])
<CartesianRepresentation (x, y, z) in km
    (-2015.95952255, -1936.61558081, 1917.25748584)>

This way there is consistency between the AstropyDecoder and the PVLDecoder and consistency among all units.

rbeyer commented 4 years ago

There are two kinds of solutions I can think of for implementing an alternate object for a parsed PVL Units Expression under the 1.0.0-alpha architecture .

  1. Subclass the Parser class of your choice, and override or extend the parse_units() method. The easiest thing is probably to extend it by letting it call super().parse_units() and then convert the PVL.Units object into whatever kind of units object you want (astropy, pint, or whatever). Ensuring that it gets encoded properly during a dump operation is a little trickier, but I think I can see how to consolidate some logic, so that you'd only have to override or extend a single method in the encoder, too (but right now, it is a little more complicated).

  2. Alternately, assume a particular API for a units object (would require some re-writing to the pvl library, but not bad), and then pass the units class that you want the parser to use (defaulting to pvl.Units), as long as it 'honored' the API (just like the encoder will take any 'dict-like' and encode it).

The second approach is 'easier' on the 'user' (they don't have to worry about subclassing a parser or encoder class).

rbeyer commented 4 years ago

Hmm, okay, I just skimmed through the documentation for astropy.units and pint, and these don't appear to present a common unified 'API' for units, so I don't know that we would be able to implement a hook to pass in a 'units' object, I think you'd have to do the first thing, and subclass the parser or encoder to deal with the particular kind of units object you want.

AndrewAnnex commented 4 years ago

@rbeyer could you elaborate on astropy.units not presenting a common unified API? Although there are about a dozen python libraries that do similar things, with careful architecture within pvl we could gain some wins through the design to allow users to use them (or a subset) without too much new code in pvl.

rbeyer commented 4 years ago

Maybe I was too hasty. I was hoping that all 'units' objects might present a similar interface (if they were all two-tuples, for instance). Then pvl could just deal with two-tuples (or a similar generic interface) internally.

The astropy.units Quantity class (which is most similar to the current PVL.Units class), has a named value and unit parameter (just like PVL.Units, but does not have list-like access).

Looks like the pint Quantity has a similar concept, but the properties are amagnitude and units parameter. So same concept, but different parameter names.

So my concern is that there isn't a single 'API' that many (most?) Python Units libraries implement, therefore, we can't program to a 'generic' units object (and just have the user pass in a Quantity class and say: 'make all my PVL Units Expressions one of these in the PVLModule you give me back"), but I'm happy to be wrong.

However, if you subclass the parser and encoder objects to override their handling of PVL Units Expressions to a specific units library, that would totally work (it just requires knowledge of each units library). So you could pretty easily create an OmniParser_withAstropyUnits or an OmniParser_withpintUnits.

AndrewAnnex commented 4 years ago

I haven't dug too deeply but it looks like pint and astropy both work with arrays so they should have list access if I understand your point. I would have to dig into it more to see how complicated it would be to delegate calls

rbeyer commented 4 years ago

While more user-friendly ways of presenting a quantity interface can always be implemented, #46 establishes the baseline functionality for quantities in pvl.