hipspy / hips

Python library to handle HiPS
https://hips.readthedocs.io
13 stars 16 forks source link

JPG image file read data altered #35

Closed adl1995 closed 7 years ago

adl1995 commented 7 years ago

This issue relates to test_tile.py file. I also asked a question on StackOverflow. The file was written using both PIL and scipy, but in both cases the read data had lost some of its information.

Also, I recently tried getting the data using Image.getdata() method, which seems to return the correct data but the data had lost its shape (262144, 4) and also the assertion still failed on tile == tile2, I think this is somehow related with __eq__ method.

cdeil commented 7 years ago

@adl1995 - I think basically the answer you got at https://stackoverflow.com/a/44737074/498873 is the correct one: reading JPEG is well-defined, you'll always get the same RGB pixel values when using any valid JPEG decoder/ reader. But when converting the numpy array to JPEG, then JPEG encoding happens, and the pixel values depend at least on the "quality" parameter (see http://pillow.readthedocs.io/en/latest/handbook/image-file-formats.html#jpeg), maybe other parameters you don't control, at least at the moment. Maybe it is possible to find the JPEG encoding parameters in the JPEG file header and when passing those to PIL.save (see example here: https://stackoverflow.com/a/19303889/498873) you could reproduce the JPEG you got from the server, i.e. achieve 1:1 round-tripping for JPEG I/O.

This would be nice, but in the meantime, we could just store the raw data of the JPEG tiles as a .rawdata data member and when writing a tile to the cache use that (see #36), i.e. never all PIL.Image.save.

@adl1995 - OK?

cdeil commented 7 years ago

In any case, for the JPEG and PNG tile cases, please pick a file and pixel where you have non-zero and different values for RGB, i.e. not (0, 0, 0) or a greyscale value like (10, 10, 10) where one cannot fully see if pixel reading at least works properly. Also, now that we have @requires_hips_extra available, you can do the tile read / write tests without having to fetch remote URLs, which is faster and more reliable. There should still be a test for fetch, e.g. you could fetch the same tile from a Github raw URL that you have locally and assert that the content you get is OK, i.e. matches what you have in the other tests using a few asserts (an assert on a pixel value being the most important one to show that the data content is being read correctly).

cdeil commented 7 years ago

I'll try to change jpg and png tile I/O now so that it round-trips, i.e. one can write and read and will have the same tile.

cdeil commented 7 years ago

Fixed in #71.