zooniverse / Lens-Zoo

Apache License 2.0
0 stars 4 forks source link

QD image error for one of the subjects #87

Closed aprajita closed 11 years ago

aprajita commented 11 years ago

i've attached a screen-grab of a subject shown in the classification interface and how it appears strangely in the QD . I'm not sure how traceable this is as I can't find how to retrieve the file name of the image affected.

main qd

drphilmarshall commented 11 years ago

Cool - Amit knows how to fix this one. Which browser and OS is that?

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm

On Wed, Apr 3, 2013 at 2:27 PM, aprajita notifications@github.com wrote:

i've attached a screen-grab of a subject shown in the classification interface and how it appears strangely in the QD . I'm not sure how traceable this is as I can't find how to retrieve the file name of the image affected.

[image: main]https://f.cloud.github.com/assets/1917970/333785/2d9c99dc-9c62-11e2-93ca-0f3bc7f0efa3.png [image: qd]https://f.cloud.github.com/assets/1917970/333786/3231e600-9c62-11e2-997a-8f944704dd2c.png

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87 .

aprajita commented 11 years ago

Chrome 26.0.1410.43 MacOSX 10.6.8

anupreeta27 commented 11 years ago

I experienced the same a while ago. Mine is Mac M. Lion - latest version of Chrome. But I doubt any of these affect this issue. The image in the classf. mode looks fine but for the same setting in the QD mode shows those funny colored lines and of course, for the brighter + bluer settings as well. If I recall correctly, this usually happened for cutouts at the border of the CFHTLS tile where half/portion of the image has pixels with 0 values (as can also be seen in Aprajita's attachment).

On Wed, Apr 3, 2013 at 10:59 PM, Phil Marshall notifications@github.comwrote:

Cool - Amit knows how to fix this one. Which browser and OS is that?

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm

On Wed, Apr 3, 2013 at 2:27 PM, aprajita notifications@github.com wrote:

i've attached a screen-grab of a subject shown in the classification interface and how it appears strangely in the QD . I'm not sure how traceable this is as I can't find how to retrieve the file name of the image affected.

[image: main]< https://f.cloud.github.com/assets/1917970/333785/2d9c99dc-9c62-11e2-93ca-0f3bc7f0efa3.png>

[image: qd]< https://f.cloud.github.com/assets/1917970/333786/3231e600-9c62-11e2-997a-8f944704dd2c.png>

— Reply to this email directly or view it on GitHub< https://github.com/zooniverse/Lens-Zoo/issues/87> .

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-15838104 .

kapadia commented 11 years ago

This happens with images missing pixels values. I need to examine the file contents more closely.

kapadia commented 11 years ago

This is a result of compressing the FITS files. During compression the floating point values are quantized. fitsjs knows how to map the quantized values back to floating point values, however, these troublesome images have also been dithered. I'll look into this more, but it may take some time as the only documentation is the source code of cfitsio :(

kapadia commented 11 years ago

Ahhh!! Okay, I've found the problem, but I don't have a solution.

First a little background:

When floating-point images are compressed the process goes:

To decompress we need to:

For these troublesome images, the problem exists when reading the integers (the very first step). The behavior is peculiar, as this is not an issue for any other set of images.

So far I've read one of these images in Python (native File reading, not PyFITS), sliced the bytes associated with the first row in the image, interpreted them as integers. The output is correct. I then saved that byte stream, read it in JavaScript, and again, the output is correct! The problem arises only when reading the FITS file in its entirety.

Two places that could cause an issue are:

1) Using the wrong type of array (e.g signed versus unsigned) 2) Byte offset issue

Both of these appear correct, so I'm lost for the next step ...

drphilmarshall commented 11 years ago

Is it possible that some extreme value associated with the masked region or its edge is causing an overflow that then affects the whole row? The solution to this would be to re-make the images without extreme values... or compress them into longer integers?

On Wed, Apr 24, 2013 at 6:19 PM, Amit Kapadia notifications@github.comwrote:

Ahhh!! Okay, I've found the problem, but I don't have a solution.

First a little background:

When floating-point images are compressed the process goes:

  • For every row
    • Dither the pixel using a pseudo-random number to reduce noise
    • Quantize pixel to integer
    • Store a scale and zero factor
    • Apply compression to the integer array

To decompress we need to:

  • Read back the integer array
  • Decompress
  • Apply dithering, scaling, and zero to value to recover an approximation of the original value

For these troublesome images, the problem exists when reading the integers (the very first step). The behavior is peculiar, as this is not an issue for any other set of images.

So far I've read one of these images in Python (native File reading, not PyFITS), sliced the bytes associated with the first row in the image, interpreted them as integers. The output is correct. I then saved that byte stream, read it in JavaScript, and again, the output is correct! The problem arises only when reading the FITS file in its entirety.

Two places that could cause an issue are:

1) Using the wrong type of array (e.g signed versus unsigned) 2) Byte offset issue

Both of these appear correct, so I'm lost for the next step ...

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16949735 .

kapadia commented 11 years ago

If that was the case, it would relate to the type of array initialized. For this image, and many others, the uncompressed values are represented as Uint8s. Comparing values against cfitsio, does not show any values that overflow ...

anupreeta27 commented 11 years ago

Hi Amit, Can you split the FITS files in sections and repeat your procedure to locate which pixels might be problematic and what their values are ? and/or Try to create a FITS file with those types of pixels and generate the problematic images. Anu

On Thu, Apr 25, 2013 at 3:57 AM, Amit Kapadia notifications@github.comwrote:

If that was the case, it would relate to the type of array initialized. For this image, and many others, the uncompressed values are represented as Uint8s. Comparing values against cfitsio, does not show any values that overflow ...

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16957141 .

drphilmarshall commented 11 years ago

We tried your suggestion, and found that yes, its the very negative pixel values (~ -90) in the "masked" regions. I say "masked" because its HumVI that masks them, they are not masked in the FITS images. When we read a problem FITS image in with pyfits, changed the values more negative than some threshold to zero, and then wrote out the file again, the resulting QD display of that file was not broken. So, it's something to do with how JS interprets compressed FITS images containing very negative values.

Anyway, one possible fix is to set these regions to zero when making the cutouts! I'm not sure what the threshold should be, negative a few probably. But it's additional work making the cutouts.

To work with the images we have is trickier, because the problem occurs on read-in to the QD. I guess Amit could run a masking python script on every image, checking for negative values and overwriting the image with a masked version (like we did in our test). This is non-optimal, because it means that the data at Adler would not be identical to th edata at IPMU, but it may be that we can live with it. What do you think?

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm

On Thu, Apr 25, 2013 at 12:34 AM, anupreeta27 notifications@github.comwrote:

Hi Amit, Can you split the FITS files in sections and repeat your procedure to locate which pixels might be problematic and what their values are ? and/or Try to create a FITS file with those types of pixels and generate the problematic images. Anu

On Thu, Apr 25, 2013 at 3:57 AM, Amit Kapadia notifications@github.comwrote:

If that was the case, it would relate to the type of array initialized. For this image, and many others, the uncompressed values are represented as Uint8s. Comparing values against cfitsio, does not show any values that overflow ...

— Reply to this email directly or view it on GitHub< https://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16957141> .

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16979193 .

anupreeta27 commented 11 years ago

Good to know the progress. Running a masking python script is only for displaying images in the QD, right? I don't see how this would affect any of the data analysis that we'd like to do at the end and so, I don't think the data being different between Adler and IPMU would be problematic.

Perhaps, its better to run the masking python script on all the images just to be safe. However, as far as I understand, this problem should occur only for images at the border within a tile. Every tile (identified by XXX in filenames CFHTLS_XXX_YYYY_g.fits) is divided into 50x50 cutouts. So, the problematic cutouts should be - the first 50 and the last 50 cutouts, 0051,0101,0151....2451 and 0050,0100,0150,..2500 (corresponding to YYYY in the filename). If you can confirm this by testing on a couple of tiles, then this may speed up the process of masking since you won't have to apply the mask on all cutouts.

On Wed, May 1, 2013 at 5:31 PM, Phil Marshall notifications@github.comwrote:

We tried your suggestion, and found that yes, its the very negative pixel values (~ -90) in the "masked" regions. I say "masked" because its HumVI that masks them, they are not masked in the FITS images. When we read a problem FITS image in with pyfits, changed the values more negative than some threshold to zero, and then wrote out the file again, the resulting QD display of that file was not broken. So, it's something to do with how JS interprets compressed FITS images containing very negative values.

Anyway, one possible fix is to set these regions to zero when making the cutouts! I'm not sure what the threshold should be, negative a few probably. But it's additional work making the cutouts.

To work with the images we have is trickier, because the problem occurs on read-in to the QD. I guess Amit could run a masking python script on every image, checking for negative values and overwriting the image with a masked version (like we did in our test). This is non-optimal, because it means that the data at Adler would not be identical to th edata at IPMU, but it may be that we can live with it. What do you think?

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm

On Thu, Apr 25, 2013 at 12:34 AM, anupreeta27 notifications@github.comwrote:

Hi Amit, Can you split the FITS files in sections and repeat your procedure to locate which pixels might be problematic and what their values are ? and/or Try to create a FITS file with those types of pixels and generate the problematic images. Anu

On Thu, Apr 25, 2013 at 3:57 AM, Amit Kapadia notifications@github.comwrote:

If that was the case, it would relate to the type of array initialized. For this image, and many others, the uncompressed values are represented as Uint8s. Comparing values against cfitsio, does not show any values that overflow ...

— Reply to this email directly or view it on GitHub< https://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16957141> .

— Reply to this email directly or view it on GitHub< https://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16979193> .

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-17272998 .

drphilmarshall commented 11 years ago

That's cool, thanks Anu! Amit, are you OK scripting this up in python? If you follow Anu's recommendation about which files to mask, I'd suggest you first copy the offending files to an attic folder, and then over-write them in the working directory with eg

mask-negative-pixels.py *.fits

The threshold value (below which pixels are masked) will need to be decided. Also, to make QD images that match the HumVI ones, the images of the same sky patch in the different filters should be masked with the same mask. Algorithm is:

Split filenames into list of bases and filternames Start loop over bases: 1) identify a set of 5 images with same filename base 2) open them all into memory with pyfits 3) make a mask (0 or 1) for each filter's image 4) multiply all 5 masks together to make the supermask 5) multiply each image by the supermask 6) write out images to the same files they came from, over-writing them Repeat

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm

On Wed, May 1, 2013 at 11:33 AM, anupreeta27 notifications@github.comwrote:

Good to know the progress. Running a masking python script is only for displaying images in the QD, right? I don't see how this would affect any of the data analysis that we'd like to do at the end and so, I don't think the data being different between Adler and IPMU would be problematic.

Perhaps, its better to run the masking python script on all the images just to be safe. However, as far as I understand, this problem should occur only for images at the border within a tile. Every tile (identified by XXX in filenames CFHTLS_XXX_YYYY_g.fits) is divided into 50x50 cutouts. So, the problematic cutouts should be - the first 50 and the last 50 cutouts, 0051,0101,0151....2451 and 0050,0100,0150,..2500 (corresponding to YYYY in the filename). If you can confirm this by testing on a couple of tiles, then this may speed up the process of masking since you won't have to apply the mask on all cutouts.

On Wed, May 1, 2013 at 5:31 PM, Phil Marshall notifications@github.comwrote:

We tried your suggestion, and found that yes, its the very negative pixel values (~ -90) in the "masked" regions. I say "masked" because its HumVI that masks them, they are not masked in the FITS images. When we read a problem FITS image in with pyfits, changed the values more negative than some threshold to zero, and then wrote out the file again, the resulting QD display of that file was not broken. So, it's something to do with how JS interprets compressed FITS images containing very negative values.

Anyway, one possible fix is to set these regions to zero when making the cutouts! I'm not sure what the threshold should be, negative a few probably. But it's additional work making the cutouts.

To work with the images we have is trickier, because the problem occurs on read-in to the QD. I guess Amit could run a masking python script on every image, checking for negative values and overwriting the image with a masked version (like we did in our test). This is non-optimal, because it means that the data at Adler would not be identical to th edata at IPMU, but it may be that we can live with it. What do you think?

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm

On Thu, Apr 25, 2013 at 12:34 AM, anupreeta27 notifications@github.comwrote:

Hi Amit, Can you split the FITS files in sections and repeat your procedure to locate which pixels might be problematic and what their values are ? and/or Try to create a FITS file with those types of pixels and generate the problematic images. Anu

On Thu, Apr 25, 2013 at 3:57 AM, Amit Kapadia < notifications@github.com>wrote:

If that was the case, it would relate to the type of array initialized. For this image, and many others, the uncompressed values are represented as Uint8s. Comparing values against cfitsio, does not show any values that overflow ...

— Reply to this email directly or view it on GitHub< https://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16957141>

.

— Reply to this email directly or view it on GitHub< https://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-16979193> .

— Reply to this email directly or view it on GitHub< https://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-17272998> .

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-17276051 .

kapadia commented 11 years ago

Finally fixed!

https://github.com/astrojs/fitsjs/commit/9a9a39bee25cf93f4482d7fc0a14a5f9396103a3

drphilmarshall commented 11 years ago

Yesssss! Well done Amit, that was a really tough one. What was the problem?

On Tuesday, May 28, 2013, Amit Kapadia wrote:

Finally fixed!

astrojs/fitsjs@9a9a39bhttps://github.com/astrojs/fitsjs/commit/9a9a39bee25cf93f4482d7fc0a14a5f9396103a3

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-18569326 .

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm

kapadia commented 11 years ago

Oh man, it was an error in the decompression code. Those images were catching a clause that other images didn't trigger. I was mistakenly updating the input (compressed) array when I should have pushed to the decompressed values. For a variety of reasons it was tough to catch, so happy it's fixed!

drphilmarshall commented 11 years ago

Yeah Amit! Nice catch :-)

On Wednesday, May 29, 2013, Amit Kapadia wrote:

Oh man, it was an error in the decompression code. Those images were catching a clause that other images didn't trigger. I was mistakenly updating the input (compressed) array when I should have pushed to the decompressed values. For a variety of reasons it was tough to catch, so happy it's fixed!

— Reply to this email directly or view it on GitHubhttps://github.com/zooniverse/Lens-Zoo/issues/87#issuecomment-18620724 .

Dr. Phil Marshall

Department of Physics (Astrophysics) University of Oxford, Denys Wilkinson Building, Room 532E (BIPAC) Keble Road, Phone: +44 1865 273345 Oxford, OX1 3RH http://www.slac.stanford.edu/~pjm