sunpy / ndcube

A base package for multi-dimensional contiguous and non-contiguous coordinate-aware arrays. "Maintainers": @danryanirish & @Cadair
http://docs.sunpy.org/projects/ndcube/
BSD 2-Clause "Simplified" License
44 stars 47 forks source link

Improve all str and repr representations of objects #413

Open Cadair opened 3 years ago

Cadair commented 3 years ago

We should at least give this a cursory glance before 2.0 final

dhomeier commented 3 years ago

In total there are 7 instances of __str__ defined in

<ndcube.ndcollection.NDCollection object at 0x11f15e540> NDCollection

Cube keys: ('seq0', 'seq1') Number of Cubes: 2 Aligned dimensions: [<Quantity 2. pix> <Quantity 3. pix> <Quantity 4. pix> <Quantity 5. pix>] Aligned physical types: [('meta.obs.sequence',), ('custom:pos.helioprojective.lon', 'custom:pos.helioprojective.lat'), ('custom:pos.helioprojective.lon', 'custom:pos.helioprojective.lat'), ('em.wl',)]

- `ExtraCoords`

ExtraCoords(exptime (0): (<Quantity [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.] s>,))

- `GlobalCoords`

GlobalCoords(length: <Quantity [400., 500., 600., 700., 800.] nm>, time: <Quantity [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.] s>)

- `table_coord.BaseTableCoordinate` – just a caller to `self.table.__str__`
- `table_coord.MultipleTableCoordinate`

<ndcube.extra_coords.table_coord.MultipleTableCoordinate object at 0x11f1fdf70> MultipleTableCoordinate(tables=[['2016-03-22T12:30:30.990' '2016-03-22T12:30:32.000' '2016-03-22T12:30:33.000' '2016-03-22T12:30:37.000' '2016-03-22T12:30:38.500' '2016-03-22T12:30:39.000' '2016-03-22T12:30:40.500' '2016-03-22T12:30:41.400' '2016-03-22T12:30:42.500']])



and an additional `__repr__` in `wcs.wrappers.compound_wcs.Mapping`
DanRyanIrish commented 3 years ago

Thanks so much for this inventory, @dhomeier.

NDCube

Following on from the changes you've suggested in #453, I think your suggestion of displaying the unit and data type are good ones. The data shape is unnecessary as its captured by the dimensions output.

In principle it would helpful to give the user and idea of the FOV of the cube in real world coords. However, in might be hard to display this is a clean but generalized way. The first thing to try would probably be to give the min/max values of each coordinate in self.combined_wcs I can think of 2 ways of doing this, but both with drawbacks.

I suggest trying the first option while making it clear what the values are, e.g. having a table in the __str__ with column names something like

Coord Name  Value at Lower-most Array Index  Value at Lower-most Array Index
----------  -------------------------------  -------------------------------

but preferably more succinct.

NDCubeSequence

For this we could also include the unit and data type of the first cube in the sequence.

We could also include a summary (min/max) of the sequence_coords and common_axis_coords if any are present. These should be divided into separate sections in the __str__. For now I think we should not try to capture the world coords of the cubes themselves. But perhaps we could expand this __str__ to include that later.

NDCollection

I think for now we can leave this __str__ as is. Although there's potentially a lot more information within the components of an NDCollection, this captures the main information at the collection level.

ExtraCoords & GlobalCoords

I like what you've done in #453. I'll comment about this on that PR.

BaseTableCoordinate & Other TableCoordinate Classes

I think BaseTableCoordinate.__str__ should be made to be something similar to ExtraCoords in #453. This should have knock-on effects for MultiTableCoordinate, SkyCoordTableCoordinate, etc. At that point we can see what changes, if any, are needed to those.

Mapping

The __repr__ should be renamed __str__ and then a new __repr__ should be created. This should give the memory location of the instance (e.g. <ndcube.ndcube.NDCube object at 0x11f0f0e20> in the above NDCube case.) followed by the output of the new __str__.

Hopefully this is a helpful roadmap.

DanRyanIrish commented 3 years ago

I suggest trying the first option while making it clear what the values are, e.g. having a table in the __str__ with column names something like

Coord Name  Value at Lower-most Array Index  Value at Lower-most Array Index
----------  -------------------------------  -------------------------------

It struck me that a more reliable (but not perfect) way of capturing the world ranges of the FOV would be to use this code to get the pixel coordinates of all the cube's corners and then pass them to pixel_to_world. The min and max of all those world coords can then be calculated and printed out. Performance-wise this will not be significantly slower than use using the lower left corner and upper right corner. But it will in all but a few cases capture the full world ranges of the FOV. The column labels could then be something like

Coord Name  Min Corner Value  Max Corner Value
----------  ----------------  ----------------
dhomeier commented 3 years ago

So, taking e.g. _test_ndcollection.cube1 with

bbox_p = np.array(cube1._bounding_box_to_points(np.zeros(cube1.wcs.naxis), cube1.dimensions.value-1, cube1.wcs))
>>> bbox_p.T
array([[0., 0., 0., 0., 4., 4., 4., 4.],
       [0., 0., 2., 2., 0., 0., 2., 2.],
       [0., 3., 0., 3., 0., 3., 0., 3.]])
>>> cube1.wcs.pixel_to_world(*bbox_p.T)
[<SkyCoord (Helioprojective: obstime=None, rsun=695700.0 km, observer=None): (Tx, Ty) in arcsec
    [(2160.07821927, 4.56894119e-02), (2160.07821927, 4.56894119e-02),
     (5039.92178073, 4.56894119e-02), (5039.92178073, 4.56894119e-02),
     (2159.6394969 , 7.19859143e+03), (2159.6394969 , 7.19859143e+03),
     (5040.3605031 , 7.19859143e+03), (5040.3605031 , 7.19859143e+03)]>, <SpectralCoord [1.02e-09, 1.08e-09, 1.02e-09, 1.08e-09, 1.02e-09, 1.08e-09, 1.02e-09,
   1.08e-09] m>]

(including all "corners" in the spectral domain, which may not be desired, especially with higher-D spectral cubes?).

For the SkyCoord part only, I think it might in fact make sense to list all 4 corners, since this would cover all sky coordinate minima/maxima in the rotated case.

DanRyanIrish commented 3 years ago

My thought was to take the minimum and maximum values of each coordinate from the set of all corners. And the point of only running the corners through pixel_to_world was for performance in cases where the cube is large. But in the case of a SkyCoord, min and max don't have an obvious meaning unless you break out the latitude and longitude separately. I hadn't considered giving the four corner values in the case of a SkyCoord. I think in most cases that's a good one. But it does get complicated if you have a spatial axis dependent on a non-spatial axis, e.g. as is the case with a scanning slit spectrograph which folds time into your x-dimension. In that case you would end up with 8 valid corners in you SkyCoord which starts to become unwieldy.

It might be simpler in the general case to break out latitude and longitude from the SkyCoord and just print the min and max values separately. What do you think?

dhomeier commented 3 years ago

Yes, if we can simply handle the celestial part separately like

bbox_p = np.array(cube1._bounding_box_to_points(np.zeros(cube1.wcs.celestial.naxis), np.array(cube1.wcs.celestial.array_shape)-1, cube1.wcs.celestial))
bbox_c = cube1.wcs.celestial.pixel_to_world(*bbox_p.T[::-1])  # why has just about everything in WCS to be turned around, backwards and inverted???
bbox_c
<SkyCoord (Helioprojective: obstime=None, rsun=695700.0 km, observer=None): (Tx, Ty) in arcsec
    [(5040.25079745, 5.39950295e+03), (5039.92178073, 4.56894119e-02),
     (2159.74920255, 5.39950295e+03), (2160.07821927, 4.56894119e-02)]>

I'd just pass bbox_c.Tx.min(), bbox_c.Tx.max() etc. to the __str__ formatter. But is there a complete list of the different coordinate types wcs.celestial might represent, e.g. when to use min(bbox_c.ra), min(bbox_c.dec) instead, and how to best check the type – self.wcs.celestial.world_axis_object_classes or self.array_axis_physical_types...?

And in principle for unrotated sky rectangles the maximum in declination could be anywhere along the row at highest dec, so one would need to identify that row and take the max of the entire slice – but for the typical field sizes represented by NDCube it's probably acceptable to neglect the difference. EDIT: my initial understanding was that the border, following a great circle, would generally reach highest declination somewhere in the middle, but actually the way the WCS is constructed, this seems in fact to be at the CRPIX1 (or CRPIX2) column.

But it does get complicated if you have a spatial axis dependent on a non-spatial axis, e.g. as is the case with a scanning slit spectrograph which folds time into your x-dimension. In that case you would end up with 8 valid corners in you SkyCoord which starts to become unwieldy.

Only a few more to take the minmax from, but I guess in general there will also be no guarantee that they are actually sitting on the corners. I guess as long as it's only a single spectral axis, one could also take out an entire spectrum for each corner. BTW which point in the spectral domain would wcs.celestial refer to in such cases?

DanRyanIrish commented 3 years ago

But is there a complete list of the different coordinate types wcs.celestial might represent, e.g. when to use min(bbox_c.ra), min(bbox_c.dec) instead, and how to best check the type – self.wcs.celestial.world_axis_object_classes or self.array_axis_physical_types...?

SkyCoord.representation_component_names maps the component names to their physical type, e.g.

{'Tx': 'lon', 'Ty': 'lat', 'distance': 'distance'}

So assuming we can rely on 'lon' and 'lat', you can invert that dictionary

component_type = dict([(item, key) for key, item in sc.representation_component_names.items()])

and find the coordinate type, e.g. 'Tx' by doing lon_component = component_types['lon']. Then you can the values by doing lon_values = getattr(sc, lon_component)

wcs.celestial

Is wcs.celestial APE-14? If not we can't use it as it won't work on all valid NDCubes. But the SkyCoord values should still be extractable from the output of pixel_to_world, even if it's a little less elegant.

Only a few more to take the minmax from, but I guess in general there will also be no guarantee that they are actually sitting on the corners. I guess as long as it's only a single spectral axis, one could also take out an entire spectrum for each corner. BTW which point in the spectral domain would wcs.celestial refer to in such cases?

This is another reason I'm not sure we can used wcs.celestal. It's probably more correct to just extract the SkyCoord from the output of pixel_to_world so long as the correct minimal corners are input. In PR #452 I'm working on a new version of NDCube._bounding_box_to_points which might help resolve this. Alternatively, as I suggested before, we could just give the min and max of the lon and lat components without trying to maintain their 2-D.

dhomeier commented 3 years ago

Thanks for the pointers; yes, I've found representation_component_names to work (at least) with the ra, dec attributes as well.

You were right that the celestial and spectral methods are only created in ~astropy.wcs.WCS and don't seem to be specified in APE-14 nor implemented by gwcs. Taking the spectral axes along will at least allow to directly include their min/max values directly (or perhaps check first if the celestial axes depend on them and otherwise strip them; but not sure that's worth the effort). The basic bounding box should be constructible from only pixel_n_dim, array_shape and pixel_to_world from the APE. But I suppose gwcs offers no equivalent to Wcsprm for accessing e.g. the crpix values? Maybe gwcs transformations are to general to infer any further info on the lon, lat extrema...

NDCube._bounding_box_to_points which might help resolve this. Alternatively, as I suggested before, we could just give the min and max of the lon and lat components without trying to maintain their 2-D.

Yes, I thought that was already the goal, giving just two values for each coordinate separately?

DanRyanIrish commented 3 years ago

Yes, I thought that was already the goal, giving just two values for each coordinate separately?

Given the generality of what we're dealing with I think that is the way to go for now. We can always update these if someone really wants to put in that effort. But two values for each coordinate is already a big improvement on what's already there.

dhomeier commented 3 years ago

Extracting all corners in addition should not be that much effort (they are already there from the bounding box transformation), but the question seems more how to display them all in a sensible for any hypercube.

DanRyanIrish commented 3 years ago

Yep, I agree.

DanRyanIrish commented 2 years ago

Extracting all corners in addition should not be that much effort (they are already there from the bounding box transformation), but the question seems more how to display them all in a sensible for any hypercube.

This seems linked to issue #200

dhomeier commented 1 year ago

This seems linked to issue #200

I had not noticed that at the time, and probably was more thinking about the printable output. But in fact, it the corner values can be stored in some kind generalised ndarray-like object, one could probably just use the format np.array used for N-dimensional arrays. Might not be the ideal way to display, since we'd have the special case of length 2 in every dimension?