opencadc / caom2tools

Common Archive Observation Model - data engineering tools
2 stars 13 forks source link

CAOM differences in Python 2 vs 3 #69

Closed dr-rodriguez closed 6 years ago

dr-rodriguez commented 6 years ago

While running my unit tests I've noticed an inconsistency in the number of decimals stored by float values in CAOM. In Python 2.7 a value of 49199.7304166 will be saved for example for a time bounds lower value. In Python 3.5, however, the same value gets saved as 49199.73041658565. In another case, I see keyword float values stored as DEC_V1=32.4748344421 in python 2 but as DEC_V1=32.47483444213867 in python 3, again a difference in the number of digits kept.

In the case of time values, it's possible the underlying problem may not be an issue in float precision, but rather in astropy time differences. For example, I notice a metaRelease date stored as 1993-07-31T17:55:40.998 under python 2 but in python 3 it gets stored as 1993-07-31T17:55:41.000.

Internally, the checksums agree: the metaChecksums stored in the xml agree with the float/time values as they are stored. As such this isn't a serious issue. However, because my unit tests also compare against stored xml generated under python 2 I was able to find these differences.

For reference, I'm using the caom 2.3.3 library in both python environments. Python 2.7.13 with astropy 2.0.1 Python 3.5.5 with astropy 3.0.1

andamian commented 6 years ago

For string, I've seen that problem in the past and I've put a patch. Do you have this in your obs_reader_writer.py file:

        if isinstance(value, float):
            # in Python 2.7 str(float) might alter precision of float
            # therefore call repr instead
            element.text = repr(value)

I can't reproduce the release date problem. Where is the release date coming from initially and value does it have?

dr-rodriguez commented 6 years ago

I do have that in my obs_reader_writer.py I think I was able to find the problem with the release date (and potentially some of the floats stored as keywords). We receive those from the database as datetime.datetime objects, convert to astropy.time.Time objects, get the modified julian date as a float, and then return the string representation of that with str(). It looks like it's better for us to use repr() in this part of the code as well.

Python 2.7:

>>> import datetime
>>> from astropy.time import Time as AstropyTime
>>>
>>> CAOMvalue = datetime.datetime(1993,7,31,17,55,41)
>>> CAOMvalue
datetime.datetime(1993, 7, 31, 17, 55, 41)
>>> timeValue = AstropyTime(CAOMvalue, format='datetime', scale='utc')
>>> timeValue
<Time object: scale='utc' format='datetime' value=1993-07-31 17:55:41>
>>> CAOMvalue = timeValue.mjd
>>> CAOMvalue
49199.74700231481
>>> CAOMvalue = str(CAOMvalue).strip()
>>> CAOMvalue
'49199.7470023'
>>>
>>> CAOMvalue = timeValue.mjd
>>> CAOMvalue
49199.74700231481
>>> CAOMvalue = repr(CAOMvalue).strip()
>>> CAOMvalue
'49199.74700231481'

Python 3.5:

>>> import datetime
>>> from astropy.time import Time as AstropyTime
>>>
>>> CAOMvalue = datetime.datetime(1993,7,31,17,55,41)
>>> CAOMvalue
datetime.datetime(1993, 7, 31, 17, 55, 41)
>>> timeValue = AstropyTime(CAOMvalue, format='datetime', scale='utc')
>>> timeValue
<Time object: scale='utc' format='datetime' value=1993-07-31 17:55:41>
>>> CAOMvalue = timeValue.mjd
>>> CAOMvalue
49199.74700231481
>>> CAOMvalue = str(CAOMvalue).strip()
>>> CAOMvalue
'49199.74700231481'
>>>
>>> CAOMvalue = timeValue.mjd
>>> CAOMvalue
49199.74700231481
>>> CAOMvalue = repr(CAOMvalue).strip()
>>> CAOMvalue
'49199.74700231481'
dr-rodriguez commented 6 years ago

After changing str() to repr() where appropriate (and only doing so for floats), I was able to resolve the discrepancies. Closing the issue.