weaverba137 / pydl

Library of IDL astronomy routines converted to Python.
BSD 3-Clause "New" or "Revised" License
28 stars 16 forks source link

REBIN exactness #60

Closed LevN0 closed 3 years ago

LevN0 commented 3 years ago

Hello,

I am looking for an identical REBIN implementation in Python. To try things out, I tried the following,

seed = 100
array = FIX(RANDOMN(seed, 100, 200) * 100, TYPE=1)
array_rebin = REBIN(array, 50, 100)

The results of using the REBIN function provided by this package did not exactly match , though the differences are no greater than 1 in any given pixel.

LevN0 commented 3 years ago

Interestingly the same experiment with float64 values + allclose does match, so this seems to be an int specific issue.

weaverba137 commented 3 years ago

Are you also comparing to the same integer type? int16 versus int32 versus int64?

LevN0 commented 3 years ago

In the test above I used TYPE=1 in IDL, which corresponds to np.uint8 in Python. This is what I compared.

weaverba137 commented 3 years ago

Thank you, I've been able to reproduce this. I think it is down to differences in how division is handled on integers between IDL and Python.

weaverba137 commented 3 years ago

Fixing this might be non-trivial, because the source code for the original IDL REBIN() function is not available, although http://www.harrisgeospatial.com/docs/rebin.html outlines the general algorithm. That is, the REBIN() function itself is not written in IDL.

There are enough subtle issues with how these packages handle integers that it may not be worthwhile to attempt to change the code. And rebinning to smaller arrays inherently involves division, even if it is division by an integer (the ratio of the array sizes). So I have to ask, do you have a specific use case for rebinning integers as integers and then requiring the results to be exact? Would it be sufficient to simply document that the results for integer inputs should not be expected to be exact due to the inherent nature of rebinning?

LevN0 commented 3 years ago

Thank you for the reply, and apologies on the delay. My interest stems from trying to replace some of our IDL codes with Python codes. Being able to do that and have easy confirmation via existing integration tests that everything is working as before would be awesome. If the output does not exactly match, the existing integration tests cannot be used without modification and careful work to ensure another more serious issue (perhaps not related to REBIN but other portions of the rewrite) is not slipping in.

That said, I understand the implementation may not be trivial. I would prefer a delay (even if long) rather than no-fix.

Is there a simple case where Python and IDL handle integer division differently?

weaverba137 commented 3 years ago

Yes, Python and IDL handle integer division significantly different. In IDL the division operation depends on the types of the numerator and denominator. In Python, there are two separate division operators, / and // that handle real and integer division respectively.

However, this is not the main concern. It is not possible to see how IDL is handling the types of the inputs to REBIN() internally, because that code is not open-source. If I could see the code, it would probably be possible to replicate the type handling and reproduce the behavior for integer inputs.

I do not have the resources to experiment with all possible combinations of type-handling that may be happening internally.

I can understand the need to replace IDL code with Python. However, that does not address why integer inputs to REBIN() need to be supported. Why not simply convert the integers to floating point, use REBIN() and then convert back as needed?

LevN0 commented 3 years ago

I understand that finding the cause of the issue is not easy without being able to see the source code for IDL's REBIN.

However, that does not address why integer inputs to REBIN() need to be supported. Why not simply convert the integers to floating point, use REBIN() and then convert back as needed?

If an IDL program uses REBIN on an int arr, and a Python program first casts same arr to float and then uses this package's REBIN, the result will not be the same even if type cast back to int.

(It seems like it might follow that converting to float first in Python would give identical answers since both REBIN's give identical answers when starting with float, but IDL's REBIN apparently uses a slightly different set of rules when starting with integers. Sorry, this is a bit hard to explain, let me know if it's not clear.)

I tested this with the same example as the top of the issue.

weaverba137 commented 3 years ago

If an IDL program uses REBIN on an int arr, and a Python program first casts same arr to float and then uses this package's REBIN, the result will not be the same even if type cast back to int.

OK, I get that, but the way you have written everything so far, you are presenting a hypothetical use case. Are you really reporting this as a hypothetical issue, or have you run into a show-stopping problem with converting real code from IDL to Python? If the latter, could you let me see the real code so that I can better understand why you need to rebin integer values in the real world?