Surprising MemoryError in `decimal` with MAX_PREC

tim-one commented 4 years ago

BPO	39576
Nosy	@tim-one, @mdickinson, @vstinner, @ned-deily, @skrah, @vedgar, @aixtools, @pablogsal, @miss-islington, @isidentical
PRs	python/cpython#18581 python/cpython#18584 python/cpython#18585 python/cpython#18594 python/cpython#18596 python/cpython#18597 python/cpython#18616 python/cpython#18617 python/cpython#18618 python/cpython#20743 python/cpython#20744 python/cpython#20745 python/cpython#20746 python/cpython#20747 python/cpython#20748

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/skrah' closed_at = created_at = labels = ['3.7', '3.8', 'type-feature', 'library', '3.9'] title = 'Surprising MemoryError in `decimal` with MAX_PREC' updated_at = user = 'https://github.com/tim-one' ``` bugs.python.org fields: ```python activity = actor = 'skrah' assignee = 'skrah' closed = True closed_date = closer = 'skrah' components = ['Library (Lib)'] creation = creator = 'tim.peters' dependencies = [] files = [] hgrepos = [] issue_num = 39576 keywords = [] message_count = 62.0 messages = ['361536', '361538', '361544', '361546', '361596', '361597', '361598', '361599', '361601', '361606', '362364', '362365', '362373', '362374', '362375', '362376', '362389', '362393', '362394', '362408', '362426', '362429', '362430', '362432', '362435', '362508', '362509', '362510', '364360', '364361', '364364', '364368', '364369', '364370', '364371', '364372', '364373', '364374', '364375', '364377', '364404', '364405', '364410', '364415', '364416', '364418', '364420', '364421', '364534', '364578', '364602', '365714', '365715', '371049', '371051', '371052', '371054', '371055', '371059', '371060', '371069', '375494'] nosy_count = 10.0 nosy_names = ['tim.peters', 'mark.dickinson', 'vstinner', 'ned.deily', 'skrah', 'veky', 'Michael.Felt', 'pablogsal', 'miss-islington', 'BTaskaya'] pr_nums = ['18581', '18584', '18585', '18594', '18596', '18597', '18616', '18617', '18618', '20743', '20744', '20745', '20746', '20747', '20748'] priority = 'normal' resolution = None stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue39576' versions = ['Python 3.7', 'Python 3.8', 'Python 3.9'] ```

tim-one commented 4 years ago

Here under Python 3.8.1 on 64-bit Windows:

>>> import decimal
>>> c = decimal.getcontext()
>>> c.prec = decimal.MAX_PREC
>>> i = decimal.Decimal(4)
>>> i / 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Of course the result is exactly 2. Which I have enough RAM to hold ;-)

The implicit conversion is irrelevant:

>>> i / decimal.Decimal(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Floor division instead works fine:

>>> i // 2
Decimal('2')

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 4 years ago

Of course the result is exactly 2. Which I have enough RAM to hold ;-)

You might think so, but if you write it as 2.00...0 with

>>> decimal.MAX_PREC
999999999999999999

zeros, I think you're overestimating your RAM capacity. :-P

Now, what is the exact significance of MAXPREC, I don't know. But I guess some hard limits were needed for standards compliance, and Python didn't want to limit you there. Of course, same as with recursionlimit, it might be better to actually have some reasonable _lower limit that we know is actually safe.

mdickinson commented 4 years ago

Agreed that this seems surprising.

@Vedran: Trailing zeros in a Decimal object are significant, so Decimal("2.0") and Decimal("2.00") are different (equal, but different). The rules about the "ideal exponent" of the result of an arithmetic operation are well-specified. In this case, the spec is clear that the answer should be Decimal("2"), and not Decimal("2.0") or Decimal("2.00"), or ....

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 4 years ago

Yeah, I should have said "represent" instead of "write". :-)

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

With _pydecimal the memory also grows very slowly (I didn't have the patience to wait for MemoryError).

I'm pretty sure decNumber also does the same, it's just easier to implement and does not slow down division for small numbers.

libmpdec always favors precisions from roughly 9-34 digits whenever there's a potential performance issue.

The only thing I could imagine is to special-case, say, prec > 10000.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

MAX_PREC is chosen so that 5*MAX_PREC does not overflow 32-bit or 64-bit signed integers. This eliminates many overflow checks for the exponent.

Updating the exponent is (perhaps surprisingly) quite performance sensitive, that's why the 32-bit build does not use a 64-bit exponent.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

The feature would be nice to have; however, if you choose the precision to match the amount of available RAM things work (I have 8GB here, one word in the coefficient has 19 digits for the 4 bit version):

>>> from decimal import *
>>> c = getcontext()
>>> c.prec = 8 * 2**30 // 64 * 19
>>> c.prec
2550136832
>>> i = Decimal(4)
>>> i / 2
Decimal('2')

So I wonder if we really need to do something here.

tim-one commented 4 years ago

Vedran, as Mark said, the result is defined to have no trailing zeroes. In general the module strives to return results "as if" infinite precision were used internally, not to actually _use_ infinite precision internally ;-) Given the same setup, e.g.,

>>> i * decimal.Decimal(0.5)
Decimal('2.0')

works fine.

This isn't purely academic. The decimal docs, at the end:

""" Q. Is the CPython implementation fast for large numbers?

A. Yes. ... However, to realize this performance gain, the context needs to be set for unrounded calculations.

>>> c = getcontext()
>>> c.prec = MAX_PREC
>>> c.Emax = MAX_EMAX
>>> c.Emin = MIN_EMIN
"""

I suggested this approach to someone on Stackoverflow, who was trying to compute and write out the result of a multi-hundred-million-digit integer exponentiation. Which worked fine, and was enormously faster than using CPython's bigints.

But then I noticed "trivial" calculations - like the one here - blowing up with MemoryError too. Which made sense for, e.g., 1/7, but not for 1/2.

I haven't looked at the implementation. I assume it's trying in advance to reserve space for a result with MAX_PREC digits.

It's not limited to division; e.g.,

    >>> c.sqrt(decimal.Decimal(4))
    ...
    MemoryError

is also surprising.

Perhaps the only thing to be done is to add words to the part of the docs _recommending_ MAX_PREC, warning about some "unintended consequences" of doing so.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

This isn't purely academic. The decimal docs, at the end:

Yes, that is a good point. I think for libmpdec I'll just do a trial divmod if prec > BIGNUM_THRESHOLD.

Perhaps the only thing to be done is to add words to the part of the docs _recommending_ MAX_PREC, warning about some "unintended consequences" of doing so.

This, too. Probably some formula based on the amount of available RAM would do.

tim-one commented 4 years ago

Formulas based on physical RAM probably work well on Linux, but not so well on Windows: Windows has no "overcommit". Whether a virtual memory request succeeds on Windows depends on how much RAM (+ swap space, if any) has already been requested across all processes. It doesn't matter whether pages have actually been created, merely the total number of pages that _may_ be created.

I never bumped into this issue before, because I never used MAX_PREC before ;-) When I intended to do exact arithmetic in decimal, I did this instead:

guesstimated the max number of decimal digits a computation would need
set prec to be comfortably - but not massively - larger than that
enabled the Inexact trap, so if I guessed too low I'd get an exception

Maybe the docs could suggest that instead?

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

Fortunately libmpdec raises MemoryError almost instantaneously, so the PR retries the affected operations with estimated upper bounds for exact results without slowing down the common case.

The docs still need updating because people will still wonder why 1 / Decimal(3) freezes the Linux machine at MAX_PREC. ;)

The Python implementation cannot use this approach because memory grows slowly.
I'm not sure though if anyone _would_ run _pydecimal at MAX_PREC.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

BTW, this PR implements the invariant:

"If there exists an exact result at a lower precision, this result should also be returned at MAX_PREC (without MemoryError)".

So non-integer powers are left out, since _decimal has no notion of exact non-integer powers yet.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset 90930e65455f60216f09d175586139242dbba260 by Stefan Krah in branch 'master': bpo-39576: Prevent memory error for overly optimistic precisions (GH-18581) https://github.com/python/cpython/commit/90930e65455f60216f09d175586139242dbba260

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset c6f95543b4832c3f0170179da39bcf99b40a7aa8 by Miss Islington (bot) in branch '3.7': bpo-39576: Prevent memory error for overly optimistic precisions (GH-18581) (bpo-18585) https://github.com/python/cpython/commit/c6f95543b4832c3f0170179da39bcf99b40a7aa8

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset b6271025c640c228505dc9f194362a0c2ab81c61 by Miss Islington (bot) in branch '3.8': bpo-39576: Prevent memory error for overly optimistic precisions (GH-18581) (bpo-18584) https://github.com/python/cpython/commit/b6271025c640c228505dc9f194362a0c2ab81c61

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 4 years ago

Hm... is "exact result" a technical term that's defined somewhere? Because to me it seems that this

"If there exists an exact result at a lower precision, this result should also be returned at MAX_PREC (without MemoryError)".

is a mathematical statement, and surely for Decimal('0.07776') ** Decimal('0.2') _there exists an exact result at a lower precision. Maybe this should be reworded as "If there exists ... _and an implementation knows about it, it should also be returned ..."

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

Vedran, msg362365 is meant to say:

"This PR implements $SOMEWHAT_MATHEMATICAL_SPEC except for inexact power."

Had I put the caveat inside the statement as well, the message would have been:

"This PR implements $SOMEWHAT_MATHEMATICAL_SPEC_EXCEPT_FOR_INEXACT_POWER except for inexact power."

Bur GNU is not UNIX and all that... ;)

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 4 years ago

Well, it all depends on how do you intend to read it. To me, the closing quote and "So non-integer powers are left out" after it suggested that the non-integer powers being left out is somehow a consequence of the above paragraph. When in fact instead of "so" I think "except" or "alas":-) should have been.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

"So non-integer powers are left out" in isolation would indeed be wrong, but actual sentence is unambiguously qualified with:

"... since _decimal has no notion of exact non-integer powers yet.", which clearly states that exact non-integer powers exist and _decimal does not recognize them (it usually produces exact results in the respective cases but still sets Inexact).

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

Updated docs:

https://github.com/python/cpython/pull/18594

The PR uses some of Tim's suggestions while also explaining how to calculate the amount of memory used in a single large decimal.

Hopefully it isn't too much information.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset a025d4ca99fb4c652465368e0b4eb03cf4b316b9 by Stefan Krah in branch 'master': bpo-39576: docs: set context for decimal arbitrary precision arithmetic (bpo-18594) https://github.com/python/cpython/commit/a025d4ca99fb4c652465368e0b4eb03cf4b316b9

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset 00e45877e33d32bb61aa13a2033e3bba370bda4d by Miss Islington (bot) in branch '3.7': bpo-39576: docs: set context for decimal arbitrary precision arithmetic (GH-18594) (bpo-18596) https://github.com/python/cpython/commit/00e45877e33d32bb61aa13a2033e3bba370bda4d

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset d6965ff026f35498e554bc964ef2be8f4d80eb7f by Miss Islington (bot) in branch '3.8': bpo-39576: docs: set context for decimal arbitrary precision arithmetic (GH-18594) (bpo-18597) https://github.com/python/cpython/commit/d6965ff026f35498e554bc964ef2be8f4d80eb7f

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

libmpdec and the docs are done, the question remains what to do with decimal.py.

It has the same behavior, but I don't think users are running decimal.py with very large precisions.

Anyway, unassigning myself in case anyone else wants to work on a patch.

tim-one commented 4 years ago

Thanks, Stefan! This turned out better than I expected ;-)

I'm closing the report, under the likely assumption that nobody cares enough about obscure inelegancies in the Python version to bother.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset b76518d43fb82ed9e5d27025d18c90a23d525c90 by Stefan Krah in branch 'master': bpo-39576: Clarify the word size for the 32-bit build. (bpo-18616) https://github.com/python/cpython/commit/b76518d43fb82ed9e5d27025d18c90a23d525c90

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset 24c570bbb82a7cb70576c253a73390accfa7ed78 by Miss Islington (bot) in branch '3.7': bpo-39576: Clarify the word size for the 32-bit build. (GH-18616) (bpo-18617) https://github.com/python/cpython/commit/24c570bbb82a7cb70576c253a73390accfa7ed78

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

New changeset c6ecd9c14081a787959e13df33e250102a658154 by Miss Islington (bot) in branch '3.8': bpo-39576: Clarify the word size for the 32-bit build. (GH-18616) (bpo-18618) https://github.com/python/cpython/commit/c6ecd9c14081a787959e13df33e250102a658154

pablogsal commented 4 years ago

This code has introduced a regression in AIX in Python 3.7.7 as the new "test_maxcontext_exact_arith" test hangs indefinitely or just segfaults.

pablogsal commented 4 years ago

This looks like a "new feature/improvement". Why was this code backported to a stable version?

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

This looks like a "new feature/improvement". Why was this code backported to a stable version?

Thanks for the lecture. This is an esoteric case between bugfix and feature that only occurs with very large context precisions.

If Bloomberg isn't happy with _decimal, they can stop using it. I'm not paid by them.

Which AIX buildbot fails? There are many green ones.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

This code has introduced a regression in AIX in Python 3.7.7

Also this is a rather bold statement since probably no one has ever run _decimal on AIX with MAX_PREC.

pablogsal commented 4 years ago

Thanks for the lecture. This is an esoteric case between bugfix and feature that only occurs with very large context precisions.

Nobody is lecturing anyone. I am just asking why this was backported.

If Bloomberg isn't happy with _decimal, they can stop using it. I'm not paid by them.

Stefan, nobody has mentioned Bloomberg in this issue so please, stop.

Also this is a rather bold statement since probably no one has ever run _decimal on AIX with MAX_PREC.

Well, I just did and I can confirm that reverting the 3.7 backport fixes the problem.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

Well, I just did and I can confirm that reverting the 3.7 backport fixes the problem.

If you are fortunate enough to have access to an AIX system, I guess you have to find out why POWER6 AIX 3.8 and PPC64 AIX 3.8 apparently work on https://buildbot.python.org/ but your 3.7 does not.

It seems rather strange to me. Are you compiling --with-system-libmpdec?

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

Hi Michael, in case you have built 3.7.7 on AIX, have you observed any problems with test_decimal?

pablogsal commented 4 years ago

If you are fortunate enough to have access to an AIX system, I guess you have to find out why POWER6 AIX 3.8 and PPC64 AIX 3.8 apparently work on https://buildbot.python.org/ but your 3.7 does not.

I am working on trying to debug where the problem comes from, but meanwhile, I wanted to report here that this introduced a difference between Python 3.7.6 and Python 3.7.7

I still did not test Python 3.8 but I suppose that it has the same problem.

Are you compiling --with-system-libmpdec?

No, I am compiling with:

CFLAGS+= -qmaxmem=-1 OBJECT_MODE=64 CONFIGURE_ARGS=--enable-shared --with-system-ffi --without-computed-gotos ac_cv_rshift_extends_sign=no

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

These flags worked for xlc when snakebite was still up:

./configure CC=xlc_r AR="ar -X64" CFLAGS="-q64 -qmaxmem=70000" LDFLAGS="-q64"

-qmaxmem was always finicky, I remember segfaults too (AIX problem, not libmpdec problem).

pablogsal commented 4 years ago

Btw, this is AIX 7.1.0.0 with xlc in case that is relevant.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

BTW, if you are compiling with xlc and there"s no xlc buildbot, perhaps a company-that-shall-not-be-named should provide one.

I mean, it's not okay to complain about a regression and then mention xlc about 10 mails later.

pablogsal commented 4 years ago

I mean, it's not okay to complain about a regression and then mention xlc about 10 mails later.

How is this related? Or is not ok to report a behaviour change in a stable release even if the platform is "best-effort"?

A regression by definition is a behaviour change between two versions. AIX is a "best-effort" platform, not a fully-supported one and I am well aware. I have not asked for a revert in anycase, precisely because of that (that decision should be up the release manager anyway), I am just communicating the regression. And doing so is perfectly "ok", even If I am using xlc (which is the native compiler on AIX).

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

A focused issue report would include at least the following:

1) Acknowledge that gcc builds work on the AIX buildbots (a fact that has been entirely ignored so far).

2) State the exact regression: msg364373 (which was also ignored, are you using xlc or xlc_r?) says that there have always been problems with xlc unless configured in a very specific manner.

 Then obviously we need the exact Python code that worked in
 3.7.6 but not in 3.7.7.

In short, given the flakiness of the xlc toolchain I'm not even sure if anything can be classified as a regression. Most xlc users I've talked to have switched to gcc.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

It is also instructive how Granlund views xlc:

https://gmplib.org/list-archives/gmp-bugs/2010-November/002119.html

pablogsal commented 4 years ago

Acknowledge that gcc builds work on the AIX buildbots (a fact that has been entirely ignored so far).

I do acknowledge that. I am not saying that I am sure there is a bug in the code. For what I know at this point it may be something with xlc, with the OS itself, with the libc version or something misconfigured on the machine.

The only thing I know is that before I could run the test suite without any problem on 3.7.6 and now I cannot in 3.7.7, and because 3.7 a maintenance branch I though this is a possible regression, so I reported it.

are you using xlc or xlc_r?

I am using xlc_r. I am saying xlc because that is the name of the compiler. The _r suffix activates thread-safe compilation (which I am doing). Apologies for the confusion.

State the exact regression

The **exact** regression is that I could run the test suite without any crash or freeze on this AIX system with 3.7.6 and I cannot in 3.7.7. At the very least this is due to the fact that there is a new test that crashes/hangs. Why this happens I still have no idea as I am currently investigating. Debugging on AIX is not a pleasant task.

n short, given the flakiness of the xlc toolchain I'm not even sure if anything can be classified as a regression. Most xlc users I've talked to have switched to gcc.

I notified the behaviour different here so we can discuss about this as a team. If we decide that this is xlc's fault or is not worth to investigate or that because AIX is not really fully supported we do not want to spend resources I can totally understand it. I thought this was important enough because this change was in 3.7, which is supposed to be stable across minor releases (the same applies for 3.8 if the problem also appears there).

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

"The **exact** regression is that I could run the test suite without any crash or freeze on this AIX system with 3.7.6 and I cannot in 3.7.7. At the very least this is due to the fact that there is a new test that crashes/hangs."

In other words, contrary to your earlier dismissal, you did NOT run _decimal on AIX with MAX_PREC but just ran the 3.7.6 tests that do not include any tests with MAX_PREC.

pablogsal commented 4 years ago

In other words, contrary to your earlier dismissal, you did NOT run _decimal on AIX with MAX_PREC but just ran the 3.7.6 tests that do not include any tests with MAX_PREC.

I did, and it raises MemoryError:

❯ uname -a AIX 1 7 powerpc 00CCAD974C00 AIX

❯ python3.7 --version Python 3.7.6

❯ python3.7 Lib/test/test_decimal.py

...

\====================================================================== ERROR: test_maxcontext_exact_arith (main.CWhitebox) ----------------------------------------------------------------------

Traceback (most recent call last):
  File "Lib/test/test_decimal.py", line 5506, in test_maxcontext_exact_arith
    self.assertEqual(Decimal(4).sqrt(), 2)
MemoryError

pablogsal commented 4 years ago

Just to be clear: I am saying that the *exact* regression is manifested with the new test because that is the behavioural difference that I experienced and how I found this problem.

but just ran the 3.7.6 tests that do not include any tests with MAX_PREC.

Independently on other considerations, If the test suite fails in 3.7.7 and succeeds in 3.7.6 that is a regression.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

Sorry, I'm reacting like Granlund now and close. These discussions lead nowhere.

pablogsal commented 4 years ago

After some debugging, I discovered that the new test succeeds if I configure and compile CPython without 'OBJECT_MODE=64' set.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 4 years ago

Since xlc has elementary bugs like

https://github.com/openssl/openssl/issues/10624

it may be worth checking out if this is an optimizer bug with -q64.

pablogsal commented 4 years ago

I will try to investigate that but it think this is because without OBJECT_MODE=64 the value of decimal.MAX_PREC is much lower (425000000 instead 999999999999999999) so whatever is happening downstream with the higher does not happen with the lower one.

I will update If I found something useful on what is going on.

python / cpython

Surprising MemoryError in `decimal` with MAX_PREC #83757