zopefoundation / persistent

automatic persistence for Python objects
https://pypi.org/project/persistent/
Other
46 stars 28 forks source link

pyTimeStamp and TimeStamp have different sub-second precision #41

Closed mgedmin closed 6 years ago

mgedmin commented 8 years ago

I don't think this is correct:

>>> from persistent.timestamp import TimeStamp, pyTimeStamp
>>> ts1 = TimeStamp(2001, 2, 3, 4, 5, 6.123456789)
>>> ts2 = pyTimeStamp(2001, 2, 3, 4, 5, 6.123456789)
>>> ts1 == ts2
True
>>> ts1.second() == ts2.second()
False
>>> ts1.second()
6.1234567780047655
>>> ts2.second()
6.123456789
mgedmin commented 8 years ago

Ah, I see that pyTimeStamp(..., s).second() == s because it skips the double conversion to raw bytes and back. Nevertheless,

>>> ts3 = pyTimeStamp(ts2.raw())
>>> ts3 == ts2 == ts1
True
>>> ts3.second() == ts1.second()
False
>>> ts3.second()
6.123457
tseaver commented 8 years ago

Re: ts1 vs. ts2: the pure Python version in ts2 is obviously more correct: The issue is that the C code which pulls in the seconds value uses:

    if (!PyArg_ParseTuple(args, "iii|iid", &y, &mo, &d, &h, &m, &sec))
        return NULL;
    return TimeStamp_FromDate(y, mo, d, h, m, sec);

At that point, the damage (lost precision from Python float -> C double) is already done. I guess we could attempt to parse it as a Python float object, and then use its as_integer_ratio to get the two values to produce the "exact" value that Python holds.

#include <stdio.h>

int main(void)
{
    long long numerator = 6894399428289933L;
    long long denominator = 1125899906842624L;
    double double_out = (double)numerator / (double)denominator;
    printf("Numerator:       %lld\n", numerator);
    printf("Denominator:     %lld\n", denominator);
    printf("Double out:      %0.9f\n", double_out);
}
$ gcc -o foo foo.c
$ ./foo 
Numerator:       6894399428289933
Denominator:     1125899906842624
Double out:      6.123456789

I think that might be a bit expensive, though.

As for the round-trip through ts2.raw(): ugh.

mcepl commented 6 years ago

Is it this failing test?

[   15s] ======================================================================
[   15s] FAIL: test_strs_equal (persistent.tests.test_timestamp.PyAndCComparisonTests)
[   15s] ----------------------------------------------------------------------
[   15s] Traceback (most recent call last):
[   15s]   File "/home/abuild/rpmbuild/BUILD/persistent-4.2.4.2/persistent/tests/test_timestamp.py", line 253, in test_strs_equal
[   15s]     self.assertEqual(str(c), str(py))
[   15s] AssertionError: '2008-12-22 15:20:-11.700000' != '2008-12-22 15:20:48.300000'
[   15s] - 2008-12-22 15:20:-11.700000
[   15s] ?                  ^^^ ^
[   15s] + 2008-12-22 15:20:48.300000
[   15s] ?                  ^^ ^

(observed in openSUSE with python 3.6.5).

mgedmin commented 6 years ago

@mcepl that looks like an unrelated bug in the C code!

jamadden commented 6 years ago

I implemented the suggestion of @tseaver in https://github.com/zopefoundation/persistent/issues/41#issuecomment-238932554, and it turns out that's not enough. Yes, we pass the exact correct double value to TimeStamp_FromDate, but that function turns around and throws precision away when it packs it into bytes:

#define SCONV ((double)60) / ((double)(1<<16)) / ((double)(1<<16))
    sec /= SCONV;
    v = (unsigned int)sec;
    ts->data[4] = v / 16777216;
    ts->data[5] = (v % 16777216) / 65536;
    ts->data[6] = (v % 65536) / 256;
    ts->data[7] = v % 256;

When we get it back from TimeStamp_sec, we get the wrong value:

static double
TimeStamp_sec(TimeStamp *self)
{
    unsigned int v;

    v = (self->data[4] * 16777216 + self->data[5] * 65536
     + self->data[6] * 256 + self->data[7]);
    return SCONV * v;
}

When TimeStamp_FromDate is passed 6.123456789, TimeStamp_sec returns 6.123457.

I haven't really looked at what the packing/conversion code is doing, maybe there's something fixable there. If not, maybe the Python version needs to lose precision deliberately.

In fact, at least in this example, Python's argument parsing is correct and already returns the correct double value, no precision lost.

jamadden commented 6 years ago

83 should fix both of the original problems (false extended precision from the C version, and round-tripping through raw() not producing the same value). I didn't find any way to do it other than losing the precision immediately, unfortunately.

mcepl commented 6 years ago

@mcepl that looks like an unrelated bug in the C code!

@mgedmin So, should I file a separate issue?

mgedmin commented 6 years ago

So, should I file a separate issue?

That would be helpful!

Also, I'm not sure if this matters, but is the machine architecture where that test failed 32-bit or 64-bit?

(I cannot reproduce the failure on my Ubuntu 18.04, x86_64, Python 3.6.5, current git master of persistent.)