hipspy / hips

Python library to handle HiPS
https://hips.readthedocs.io
13 stars 16 forks source link

How to handle tiles in different formats? #36

Closed cdeil closed 7 years ago

cdeil commented 7 years ago

Currently we have a class HipsTile with data members:

We want to support the three formats used for HiPS: fits, jpeg, png

What we have now isn't nice, because header is a FITS header and doesn't apply to the jpeg and png case. But more importantly, as mentioned in #35, our current scheme can't handle JPEG well, because it's a lossy format, so to / from Numpy array can change the pixel data.

One feature we need is to be able to download and store on local disk JPEG tiles, unaltered, i.e. not going though JPEG decompression to Numpy array and compression again to JPEG, resulting in different pixel data. This means we have to store the JPEG file contents (i.e. raw_image = BytesIO(urllib.request.urlopen(url).read()) as a data member on the JPEG HiPS tile, so that we can write it unaltered to local disk or cache if we want.

There's a few options:

  1. add a .raw_data member to HipsTile and change .data into a property (maybe rename to .array_data) that's computed from .raw_data when needed (i.e. for drawing).
  2. Introduce a class hierarchy HipsTileJpeg, HipsTilePng, HipsTileFits with HipsTile as an abstract base class. We would still add .raw_data at least for HipsTileJpeg, it would mostly be a code re-organisation where the if-else stuff that you currently do for each format in HipsTile gets split out into separate classes / methods.
  3. An alternative would be to introduce a HipsTileData base class and then do the HipsTileDataJpeg etc differences only on these data classes, with a single common HipsTile class. The question if this or option 2 works better has to do with the question how much the data and metadata/header handling is coupled. I think we discussed this previously and said that for JPEG and PNG tiles, the file header doesn't contain any useful information and is irrelevant.

I think overall this is a difficult question, but one that we can mostly punt on for now and only come back to if / when we start generating HiPS tiles, for now we only need 1:1 copy and reading. So I think my suggestion for now would be to just do the small change outlined as option 1, i.e. to keep rawdata around at least for the JPEG case and to use that from write.

@adl1995 @tboch - Thoughts?

adl1995 commented 7 years ago

I just made a PR #38 which renames data to raw_data. But, for now, the data property just simply returns raw_data, I think this would need to be changed.

cdeil commented 7 years ago

@adl1995 - I finished the change to store bytes on HipsTile suggested above in #71.