Different results on Apple ARM CPU since upgrading to 1.35

0x26res commented 1 month ago

We've notice that some of our unit tests started failing on macos/ARM since updating to 1.35. These tests were fine on 1.34 and are still running fine on 1.35 when using intel CPU.

I had a look at the change log but I can't find an obvious explanation.

Here's a reproducible example:

import QuantLib as ql

def test_bond_yield():
    day_counter = ql.Thirty360(ql.Thirty360.EurobondBasis)
    schedule = ql.Schedule(
        ql.Date(9, 5, 2017),  # effectiveDate
        ql.Date(9, 5, 2028),  # terminationDate
        ql.Period(2),  # tenor
        ql.NullCalendar(),  # calendar
        ql.Unadjusted,  # convention
        ql.Unadjusted,  # terminationDateConvention
        ql.DateGeneration.Forward,  # rule
        False,  # endOfMonth
        ql.Date(9, 11, 2017),  # firstDate
        ql.Date(9, 11, 2027),  # nextToLastDate
    )
    coupons = ql.FixedRateLeg(
        schedule=schedule,
        dayCount=day_counter,
        nominals=[100.0],
        couponRates=[0.04836],
        paymentAdjustment=ql.Unadjusted,
        firstPeriodDayCount=day_counter,
        exCouponPeriod=ql.Period(),
        exCouponCalendar=ql.NullCalendar(),
        exCouponConvention=ql.Unadjusted,
        exCouponEndOfMonth=False,
        paymentCalendar=schedule.calendar(),
    )
    redemption = ql.Redemption(100, coupons[-1].date())
    cash_flows = (*coupons, redemption)
    bond = ql.Bond(
        2,  # settlementDays
        schedule.calendar(),  # calendar
        100.0,  # faceAmount
        ql.Date(),  # maturityDate
        ql.Date(),  # issueDate
        cash_flows,  # cashFlows
    )
    bond_yield = bond.bondYield(
        30,  # price
        day_counter,  # dc
        ql.Compounded,  # compounding
        2,  # freq
        ql.Date(15, 3, 2023),  # settlement
    )
    assert bond_yield == 0.3514018072914846 # 0.35140180729148496 on ARM CPU and QuantLib==1.35

Obviously the difference is very small, but ideally we'd like the library to be deterministic.

boring-cyborg[bot] commented 1 month ago

Thanks for posting! It might take a while before we look at your issue, so don't worry if there seems to be no feedback. We'll get to it.

lballabio commented 1 month ago

Sorry, that's an unrealistic expectation. Floating-point numbers are by their own nature exposed to rounding errors, and testing for equality is not robust. You can try this in a Python shell:

More on this, and on how to test robustly, at https://floating-point-gui.de/errors/comparison/.

As for why this particular test happens to fail only on certain architectures, I suppose the compiler is translating the code into different instructions on different processors.

0x26res commented 1 month ago

@lballabio I'm not disagreeing with the fact that floating point operations are not associative. (0.1 + 0.2) != (0.15 + 0.15). But they are deterministic: (0.1 + 0.2) == (0.1 + 0.2)

I guess as far as we are concerned we could loosen our tests tolerance. Or have different expected results for arm vs intel.

But this problem started appearing between 1.34 and 1.35, and I can't find the code change that would have caused it. Do you have any idea?

lballabio commented 1 month ago

I'm not disagreeing with the fact that floating point operations are not associative. (0.1 + 0.2) != (0.15 + 0.15). But they are deterministic: (0.1 + 0.2) == (0.1 + 0.2)

Agreed. My point was that, if the compiler starts optimizing differently for whatever reason, it might now generate 0.1 + 0.2 where it used to generate 0.15 + 0.15, even if the code didn't change. That's why I'd suggest using a reasonable test tolerance.

If you're using the QuantLib wheels from PyPI, the most likely explanation is that the 1.34 wheels were built on an Intel Mac cross-compiling to M1, while the 1.35 wheels were built natively on a M1 Mac (they're built using GitHub actions, and native M1 runners were not yet available at the time of 1.34.)

0x26res commented 1 month ago

If you're using the QuantLib wheels from PyPI, the most likely explanation is that the 1.34 wheels were built on an Intel Mac cross-compiling to M1, while the 1.35 wheels were built natively on a M1 Mac (they're built using GitHub actions, and native M1 runners were not yet available at the time of 1.34.)

Yes we're using the PyPI wheels. I guess that makes sense as an explanation. Thanks.

lballabio / QuantLib

Different results on Apple ARM CPU since upgrading to 1.35 #2069