python / cpython

The Python programming language
https://www.python.org
Other
63.64k stars 30.48k forks source link

Merge C version of decimal into py3k. #51901

Closed mdickinson closed 12 years ago

mdickinson commented 14 years ago
BPO 7652
Nosy @malemburg, @rhettinger, @amauryfa, @mdickinson, @pitrou, @vstinner, @ericvsmith, @benjaminp, @cedk, @asvetlov, @skrah, @ericsnowcurrently, @jimjjewett
Files
  • 49433f35a5f8.diff
  • bba956250186.diff
  • be8a59fcba49.diff: fixed all comments except the ones in ISSUES.txt
  • ppro-mulmod.txt: Proof for x87 FPU modular multiplication
  • 40917e4b51aa.diff
  • 9b3b1f5c4072.diff
  • api-demo.c
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/skrah' closed_at = created_at = labels = ['extension-modules', 'library', 'performance'] title = 'Merge C version of decimal into py3k.' updated_at = user = 'https://github.com/mdickinson' ``` bugs.python.org fields: ```python activity = actor = 'skrah' assignee = 'skrah' closed = True closed_date = closer = 'skrah' components = ['Extension Modules', 'Library (Lib)'] creation = creator = 'mark.dickinson' dependencies = [] files = ['22404', '23822', '23959', '23966', '24615', '24776', '25049'] hgrepos = ['25'] issue_num = 7652 keywords = ['patch'] message_count = 83.0 messages = ['97347', '97348', '97355', '97372', '97373', '97377', '97382', '97411', '97412', '97719', '98161', '120305', '120674', '121452', '138029', '138583', '148650', '148652', '148669', '148677', '148678', '148680', '148682', '148687', '148688', '148689', '148690', '148691', '148721', '149556', '149557', '149559', '149567', '149600', '153447', '153506', '153507', '154074', '154988', '154992', '155031', '155034', '155036', '155045', '155046', '155047', '155050', '155059', '155070', '155071', '155079', '155082', '155083', '155085', '155086', '155087', '155088', '155093', '155095', '155107', '155135', '155331', '155359', '155381', '155419', '155644', '155649', '155743', '155744', '156402', '156480', '156500', '156671', '156890', '156952', '164470', '165329', '165330', '165339', '165341', '168034', '168035', '169692'] nosy_count = 19.0 nosy_names = ['lemburg', 'rhettinger', 'amaury.forgeotdarc', 'mark.dickinson', 'pitrou', 'vstinner', 'casevh', 'eric.smith', 'benjamin.peterson', 'jjconti', 'Arfrever', 'ced', 'asvetlov', 'skrah', "Amaury.Forgeot.d'Arc", 'python-dev', 'eric.snow', 'Ramchandra Apte', 'Jim.Jewett'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'performance' url = 'https://bugs.python.org/issue7652' versions = ['Python 3.3'] ```

    mdickinson commented 14 years ago

    I've created this issue to keep track of progress in merging Stefan Krah's work on decimal in c into py3k.

    We've created a branch py3k-cdecimal (with merge tracking from py3k) for this work. When the branch is fully working and tested we'll consult python-dev about next steps.

    ericvsmith commented 14 years ago

    Is the intention to write Decimal.__format__ in C, too? That would be quite a bit of work, and I'm not sure I could recommend it. But I'm not sure if your plan is to get rid of all Python code or not.

    If your plan is to rewrite absolutely everything in C, I could help out by exposing the methods that parse format specifiers and do some of the low level formatting. They're used internally by the int, float, and str formatting code. Let me know.

    mdickinson commented 14 years ago

    Just to clarify, no decision has yet been made on *whether* the cdecimal work should be integrated into py3k; we'll consult python-dev on this once we've got a working branch and performance information.

    mdickinson commented 14 years ago

    To answer Eric's question: Decimal.__format__ is already implemented in Stefan's work---it looks like most of the code is in

    http://svn.python.org/projects/python/branches/py3k-cdecimal/Modules/cdecimal/io.c

    (Stefan, is this right?)

    mdickinson commented 14 years ago

    So the new branch looks great---thanks, Stefan! I'm only just beginning to look at the code properly, though.

    A couple of things:

    (1) Could we unify test_decimal and test_cdecimal somehow? This would avoid them getting out of sync when new tests are added, and would make it clear what the differences between them are. It looks like there's currently a lot of duplicate code.

    (2) At some point we'll need some documentation. Even if all it says is: the cdecimal module operates identically to the decimal module, with the following exceptions... (notes on threading differences, exponent limits, correct rounding of pow, etc.)

    briancurtin commented 14 years ago

    mark> (1) Could we unify test_decimal and test_cdecimal somehow? mark> This would avoid them getting out of sync when new tests are mark> added, and would make it clear what the differences between mark> them are. It looks like there's currently a lot of duplicate code.

    An approach similar to the one taken in test_warnings.py might work: write common test code as a base class, then subclass it to be run against both the C and Py versions of the module.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Yes, formatting is completely implemented in io.c, together with quite a comprehensive test suite. I like the new Python format strings, so I wanted them in the C library, too.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Unify test_decimal and test_cdecimal:

    Yes, quite possible. The diff is currently 400 lines, but it should be easy to get that down to below 100 without any loss of functionality.

    I'll look into that when I'm done with the 64 bit ANSI path.

    Documentation:

    Anything is welcome, even a patch that just creates a stub so I don't have to figure out where to put it.

    The differences are listed at the bottom of:

    http://www.bytereef.org/cdecimal.html

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    Just an update. Rev.77358 should compile and run stable on the buildbot platforms except Alpha and ia64. I'm working on a default ANSI path for 64-bit.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    As a first step in unifying test_decimal.py and test_cdecimal.py I would like to patch test_decimal.py in trunk and py3k. This is to minimize differences between py3k and py3k-cdecimal.

    (1) Remove test that Decimal(x) generates a copy.

    (2) Add test case to formatting test.

    (3) Extend threading test.

    (4) Use Emax of 425000000 instead of 999999999 where possible. (The 32-bit version of cdecimal has the official limit of 425000000, even though 999999999 works in almost all cases.)

    If I get an OK for the two patches, I can commit them to py3k and trunk. If you don't want to apply (1), I'll make new patches.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    All outstanding issues mentioned here have been solved in Rev. 77696:

    (1) New ANSI path for unknown 64bit platforms (ia64 and Alpha build without problems now).

    (2) Unified tests for decimal and cdecimal.

    (3) Documentation for cdecimal.

    Other improvements:

    (4) Added comprehensive test suite for testing the library directly.

    (5) Fixed warnings in Visual Studio.

    (6) Code formatting.

    3807ddda-7a46-46c2-9a7e-24172362f2cc commented 14 years ago

    Has the cdecimal branch kept up with the hash value changes in 3.2?

    Is there a still a chance that cdecimal could be merged into 3.2?

    mdickinson commented 14 years ago

    On Wed, Nov 3, 2010 at 3:28 AM, Case Van Horsen \report@bugs.python.org\ wrote:

    Has the cdecimal branch kept up with the hash value changes in 3.2?

    Not sure; that's a question for Stefan.

    Is there a still a chance that cdecimal could be merged into 3.2?

    A chance, yes; the major need is for someone with time to do a full review. (I'm afraid that's not me, right now.)

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 14 years ago

    An update on the progress:

    All development currently happens in my private mpdecimal repository. The next version of mpdecimal (2.0) is finished, stable and will be released once all tests have completed successfully. Running the whole test suite can take several weeks, since Valgrind is involved and all Python versions from 2.5 to 3.2 are tested.

    In py3k-cdecimal, r86497 is an exact copy of mpdecimal-2.0. All buildbots pass the short tests.

    Major improvements:

    o Full compatibility with decimal.py 3.2, including the new hash functions and float operations.

    o With the new FloatOperation signal, accidental float operations can be detected.

    o The underlying library - libmpdec - now has 100% code coverage together with Makefile targets for creating coverage reports. In particular, every possible allocation failure during an operation can be tested in brute force style.

    o The module has 85% code coverage. All lines except failures of Python C-API functions are tested.

    o Several minor bug fixes, most of them deal with allocation failures under extreme bignum conditions.

    Potential reviewers:

    I'll be happy to answer questions here or privately. IMO the best way to get acquainted with the module is to do the regular build and tests first, then explore Lib/test/mpdecimal, reading LIBTEST.txt and PYTEST.txt.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 13 years ago

    Just a couple of remarks about the diff I created:

    The changes to decimal.py are exploratory (i.e. done quite hastily) and serve the purpose to fulfill PEP-399.

    library/cdecimal.rst is completely out of date.

    The rest should be very stable.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 13 years ago

    The latest patch is based on a relatively stable revision of 3.3. To my knowledge, _decimal.c and decimal.py are now fully compatible in the sense of PEP-399.

    libmpdec \========

    o New test suite with comprehensive tests against decNumber.
    
    o Full support for 32-bit compilers (tested with CompCert).

    PEP-399 \=======

    o cdecimal.c is now called _decimal.c. Instead of importing cdecimal,
      _decimal is transparently imported as decimal (if the C version is
      available).
    
    o Unified unit tests with 100% code coverage for both decimal.py and
      _decimal.c. For _decimal.c, the tests include all failures of
      Python API functions (requires special patch for testing).
    
    o deccheck.py now also tests arbitrary input and makes sure that both
      modules raise the same exceptions.
    
    o Both modules produce the same pickle output for Decimal and Context.

    _decimal.c \==========

    o Speed up int/Decimal conversion for integers that fit into a
      single word of a PyLongObject (performance gain is around 15%).
    
    o real(), imag(), conjugate(), __complex__() support.
    
    o Fraction and complex comparison support.
    
    o Decimal constructor now accepts lists as well as tuples.
    
    o DecimalTuple support.
    
    o General cleanup and refactoring. The functions for conversions 
      between Decimal and other numeric types are much cleaner now 
      and could be used for a PyDec_* API.
    vstinner commented 12 years ago

    Just to clarify, no decision has yet been made on *whether* the cdecimal work should be integrated into py3k; we'll consult python-dev on this once we've got a working branch and performance information.

    So, what is the status today?

    _decimal looks to be huge. Does Python really need yet another multiprecision library? There is already gmpy and bigfloat, based on the heavily optimized GMP library, for example. Is it a license issue? Can't we reuse GMP/MPFR to offer a Decimal API?

    _decimal should maybe first be distributed as a third party library until it is really well tested and its API is really stable, until we can decide to integrate it. The patch adds __setattr__ to the Decimal class.

    mdickinson commented 12 years ago

    Does Python really need yet another multiprecision library?

    It's not really another library: it's a reimplementation of the existing decimal library in C. The decimal library is *hugely* valuable to the financial world, but its slowness is a major concern. _decimal would help to address that concern.

    Can't we reuse GMP/MPFR to offer a Decimal API?

    Nope: those are for binary floating-point. Shoehorning decimal semantics on top of a binary floating-point library is a really bad idea. (Actually, that's a part of why decimal.py is slow---it's using Python's *binary integers to store *decimal coefficients, so that even simple addition is now a quadratic operation, thanks to the binary \<-> decimal conversions involved.)

    _decimal should maybe first be distributed as a third party library until it is really well tested and its API is > really stable.

    My take is that this has already happened.

    The only problem from my perspective is getting someone to find time to review such a massive patch. I've been wondering whether we could get away with some kind of 'statistical' review: do a large-scale review, and then instead of having someone go through every line of C code, pick a few representative sections at random and review those. If those code portions make it through the review unscathed, declare the code good and merge it in.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Binary versus decimal ---------------------

    There is already gmpy and bigfloat, based on the heavily optimized GMP library, for example. Is it a license issue? Can't we reuse GMP/MPFR to offer a Decimal API?

    _decimal is a PEP-399 compliant C implementation of decimal.py. The underlying standard is Cowlishaw/IBM's "General Decimal Arithmetic Specification".

    decimal.py is used for standard-conforming financial calculations. There is no way to implement this in a reasonable manner using a binary floating point library.

    Additionally, _decimal is also heavily optimized. In fact, for small precisions the module has the same speed as gmpy!

    Soundness and code size -----------------------

    _decimal should maybe first be distributed as a third party library until it is really well tested and its API is really stable, until we can decide to integrate it.

    Except for a different directory structure, the cdecimal module is identical to this patch. cdecimal has been distributed for almost two years now and has been on pypi for a year.

    There have been many downloads from financial institutions, stock exchanges and also research institutes. I know for a fact from a private email correspondence that libmpdec is used in a billing application of a large national NIC.

    cdecimal *appears* to be huge because it has a test suite that actually provides 100% code coverage. Indeed this means that even every possible malloc failure is simulated together with an assertion that the result of the function is (NaN, Malloc_error).

    The test suite now tests against both decimal.py and decNumber. It has found several small issues in decimal.py, a bug in netlib's dtoa.c, a bug in gmp and a bug in CompCert.

    The latest tests against decNumber have found 18 issues in decNumber (that I haven't reported yet).

    In the past 8 months, regression tests for cdecimal-2.3 have been running trillions of test cases both with and without Valgrind.

    Review ------ The patch could be audited by focusing on basearith.c, cdecimal.c and mpdecimal.c. cdecimal.c is a long but simple wrapper around libmpdec. mpdecimal.c contains all functions of the specification. I contend that for a C programmer mpdecimal.c is not significantly harder to read than decimal.py.

    The tricky algorithms (newtondiv, invroot, sqrt-via-invroot and ln) have mechanical proofs in ACL2.

    An initial audit could certainly disregard convolute.c, crt.c, difradix2.c, fnt.c, numbertheory.c, transpose.c and umodarith.h.

    These are only needed for the number theoretic transform that kicks in at around 22000 digits.

    Context type safety -------------------

    The patch adds __setattr__ to the Decimal class.

    Making the context more strictly typed has instantly found a bug in one of decimal.py's docstring tests:

    # This doctest has always passed:
    >>> c = Context(ExtendedContext)
    
    # But the operation is meaningless:
    >>> c
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/decimal.py", line 3708, in __repr__
        % vars(self))
    TypeError: %d format: a number is required, not Context
    >>>

    What is the concern about __setattr__? For *setting* contexts, speed is not so important (for reading contexts it is).

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Mark Dickinson \report@bugs.python.org\ wrote:

    The only problem from my perspective is getting someone to find time to review such a massive patch. I've been wondering whether we could get away with some kind of 'statistical' review: do a large-scale review, and then instead of having someone go through every line of C code, pick a few representative sections at random and review those. If those code portions make it through the review unscathed, declare the code good and merge it in.

    The regex module is in a somewhat similar situation. If I'm interpreting this

    http://mail.python.org/pipermail/python-dev/2011-August/113240.html

    dialogue correctly, a complete audit down to the last line isn't always necessary.

    pitrou commented 12 years ago

    If I'm interpreting this http://mail.python.org/pipermail/python-dev/2011-August/113240.html dialogue correctly, a complete audit down to the last line isn't always necessary.

    It is also helped by the fact you are a core developer and we trust you to be here to do maintenance :) I'd add that decimal is a fairly specialized module and the implications of a new engine are not as wide-reaching as a new regex engine. Especially if the pure Python version is still there.

    I think it's still probably a good idea to probe python-dev, if that hasn't already happened.

    rhettinger commented 12 years ago

    We've been wanting this for a long time.

    Strong +1 from me.

    amauryfa commented 12 years ago

    I can help with the review. Is http://bugs.python.org/review/7652/show a good starting point? I already have some comments.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Antoine Pitrou \report@bugs.python.org\ wrote:

    It is also helped by the fact you are a core developer and we trust you to be here to do maintenance :)

    Sure. The specification doesn't really change, so the work will hopefully be limited. :)

    I think it's still probably a good idea to probe python-dev, if that hasn't already happened.

    Yes, I'm planning to do that.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Raymond Hettinger \report@bugs.python.org\ wrote:

    We've been wanting this for a long time.

    Strong +1 from me.

    Thank you, Raymond!

    vstinner commented 12 years ago

    (Actually, that's a part of why decimal.py is slow---it's using Python's *binary integers to store *decimal coefficients, so that even simple addition is now a quadratic operation, thanks to the binary \<-> decimal conversions involved.)

    Oh, I forgot this minor detail (base 2 vs base 10).

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Amaury Forgeot d'Arc \report@bugs.python.org\ wrote:

    I can help with the review. Is http://bugs.python.org/review/7652/show a good starting point? I already have some comments.

    Yes, that would be great. Apart from two or three changes that I still need to push patch set 4 is the latest version.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Stefan Krah \report@bugs.python.org\ wrote:

    Yes, that would be great. Apart from two or three changes that I still need to push patch set 4 is the latest version.

    Hmm, no. I'll create a slightly newer patch from Oct. 1st.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    [Amaury]

    Overall, I think that the "mpd" C library should be better separated from the _decimal module (a bit like _ctypes, with the libffi library): its own configure & makefile, its own test suite... which are not necessarily related to Python.

    Except for its own directory libmpdec has all that (See LIBTEST.txt). Library tests are in the tests/ directory, python tests in the python/ directory.

    Are you suggesting to build a static library and then use that to build the module? I remember this didn't work on Windows (not to mention AIX and such).

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Amaury has asked for more comments (and I agree). However, I'm not sure what level of detail would be appropriate. As an example, I've posted the full proof of the x87 modular multiplication in umodarith.h.

    Even with the Coq parts stripped, this would still be a massive comment.

    Would you prefer that level of detail or should I just post the core of the algorithm?

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Stefan Krah \report@bugs.python.org\ wrote:

    Would you prefer that level of detail or should I just post the core of the algorithm?

    Argh. s/post/add to comments in umodarith.h/

    pitrou commented 12 years ago

    Amaury has asked for more comments (and I agree). However, I'm not sure what level of detail would be appropriate. As an example, I've posted the full proof of the x87 modular multiplication in umodarith.h.

    Even with the Coq parts stripped, this would still be a massive comment.

    You could ship it as a separate .txt file (like we have e.g. Objects/dict_notes.txt).

    9a91b5d9-3571-4515-baf6-e38227828e99 commented 12 years ago

    2011/12/15 Stefan Krah \report@bugs.python.org\

    Stefan Krah \stefan-usenet@bytereef.org\ added the comment:

    Amaury has asked for more comments (and I agree). However, I'm not sure what level of detail would be appropriate. As an example, I've posted the full proof of the x87 modular multiplication in umodarith.h.

    Even with the Coq parts stripped, this would still be a massive comment.

    Would you prefer that level of detail or should I just post the core of the algorithm?

    For my part, a two-lines description of the purpose of file is enough. Something like "Routines for the reverse transmogrification of randomized digits, used in multiplication of numbers above 2**32 bits" Or something else that makes sense :-)

    At least something that makes it clear that I don't have to read further if I'm only interested in the definition of Python classes for example.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    For my part, a two-lines description of the purpose of file is enough.

    OK, I'll go for small comments in the files themselves and put big ones in separate files.

    vstinner commented 12 years ago

    I tried my timestamp patch with _decimal: it fails because decimal and _decimal API is not exactly the same.

    decimal.Decimal.__truediv__() has an optional context argument, whereas _decimal defines PyNumberMethods.

    decimal.Decimal.quantize() second argument is optional and its default value is None, but if I pass None to _decimal.Decimal.quantize(), I get a TypeError because _decimal expects an integer.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    STINNER Victor \report@bugs.python.org\ wrote:

    decimal.Decimal.__truediv__() has an optional context argument, whereas _decimal defines PyNumberMethods.

    Regarding the special methods: decimal.py uses the optional context arguments for convenience so that these methods can be re-used in other places.

    I wouldn't consider this context argument as part of the API.

    decimal.Decimal.quantize() second argument is optional and its default value is None, but if I pass None to _decimal.Decimal.quantize(), I get a TypeError because _decimal expects an integer.

    About this I'm not sure. I think type errors are a courtesy to the user. Look what is possible now in decimal.py:

    Decimal('9')

    But here the argument might well be made for accepting None (and only None apart from rounding modes). - I hope Mark and Raymond will give their opinions, too.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    I walked into the Roundup trap again:

    >>> Decimal(9).quantize(1, "?!?!?")
    Decimal('9')
    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Over the last two months I've done a full review of all files except _decimal.c and mpdecimal.c.

    I now have additional ACL2 proofs for a couple of functions in basearith.c (some are partial proofs), a full proof for the special case (d = 10**19) of Granlund/Montgomery's "Divide double word by constant" algorithm and a full proof for the Chinese Remainder Theorem in crt.c.

    I didn't find anything important. I found a couple of useless variable initializations, missing comments etc.

    There was one (deliberate) incompatibility in the format function: If rounding lead to an Overflow, _decimal printed 'Infinity'.

    _decimal can't read back numbers that are out of bounds, so this was done to ensure that a roundtrip is always possible.

    But I changed the function so _decimal now uses exactly the same algorithm for formatting as decimal.py.

    vstinner commented 12 years ago

    How can I help to integrate this module into CPython?

    The test suite pass in debug and release mode without any failure on my Linux box (64 bits, running Ubuntu 11.10).

    rhettinger commented 12 years ago

    Victor, yes, the decimal module needs a C implementation. Without it, the pure Python code is abysmally slow. Other MP implementations don't fill the need or come close to implementing the decimal arithmetic spec.

    74c4563b-ab1c-43d8-9219-30c4eca796bc commented 12 years ago

    (1) I think this module would benefit greatly from a map explaining what each file does, and perhaps from some reorganization.

    As best I can yet tell, there are about \~130 files, over a dozen directories, but the only ones that directly affect the implementation are a subset (~33) of the *.c and *h files in Modules/_decimal/ (and not subdirectories).

    Even files that do affect the implementation, such as mpdecimal.c, also seem to have functions thrown in just for testing small pieces of functionality, such as Newton Division.

    There may also be some code that really isn't needed, except possibly for backwards compatibility, and could be #ifdef'ed or at least commented. For example, the comments above io.c function _mpd_strneq(const char *s, const char *l, const char *u, size_t n) mention working around the Turkish un/dotted-i problem when lowercasing -- but why is a decimal library even worried about casing?

    (2) Is assembly allowed? If not, please make it clear that vcdiv64.asm is just an optional speedup, and that the code doesn't rely upon it.

    (3) Are there parts of this library that provide functionality NOT in the current decimal library? If so, this should be at least documented, and perhaps either removed or exposed.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Jim, thanks for taking a look at this.

    Jim Jewett \report@bugs.python.org\ wrote:

    (1) I think this module would benefit greatly from a map explaining what each file does, and perhaps from some reorganization.

    Just MAP.txt in the top level directory? Amaury suggested moving the library into a subdirectory. I'm not sure about that. The library would be out of sight, but is that a good thing?

    As best I can yet tell, there are about \~130 files, over a dozen directories, but the only ones that directly affect the implementation are a subset (~33) of the *.c and *h files in Modules/_decimal/ (and not subdirectories).

    Indeed the top level directory contains _decimal.c and all files from libmpdec. Almost all files are required.

    The three subdirectories contain:

    tests/ -> library tests python/ -> extended module tests literature/ -> pointers to papers and explanations for trickier algorithms.

    Even files that do affect the implementation, such as mpdecimal.c, also seem to have functions thrown in just for testing small pieces of functionality, such as Newton Division.

    That is correct. They were deliberately added in that place because they rely on a couple of inline functions and I have a policy of testing the exact code that the original function relies on.

    The alternative is to extract all functions needed, move them to the test directory and hope that the code doesn't get stale. But if you have a better idea, I'd be glad to hear it: I don't like the test functions in that place either.

    The reason that Newton Division is tested for ridiculously small precisions like prec=1 is that it should pass IBM's test suite just like the regular division function. (Also, small precisions are most likely to expose problems).

    There may also be some code that really isn't needed, except possibly for backwards compatibility, and could be #ifdef'ed or at least commented.

    I'm not aware of any except for whole files:

    mpsignal.c -> signaling wrappers for the mpdecimal.c functions, not needed for _decimal.c but part of libmpdec.

    mptest.h -> header for running the tests.

    bench.c -> library benchmark.

    Turkish un/dotted-i problem when lowercasing -- but why is a decimal library even worried about casing?

    "Infinity", "InFinItY", "iNF" are all allowed by the specification.

    (2) Is assembly allowed?

    I was under the assumption that it is allowed:

    Python/pymath.c:23: __asm __volatile ("fnstcw %0" : "=m" (cw)); Python/pymath.c:28: __asm __volatile ("fldcw %0" : : "m" (cw)); Python/ceval.c:43: asm volatile ("mftbu %0" : "=r" (tbu) ); Python/ceval.c:44: asm volatile ("mftb %0" : "=r" (tb) ); Python/ceval.c:45: asm volatile ("mftbu %0" : "=r" (tbu2)); Python/ceval.c:59: __asm __volatile("rdtsc" : "=A" (val)) Python/ceval.c:69: __asm __volatile("rdtsc" : \

    If not, please make it clear that vcdiv64.asm is just an optional speedup, and that the code doesn't rely upon it.

    No code relies on asm. Assembly is only used for the double word mul/divmod primitives in typearith.h and the Pentium PRO modular multiplication in umodarith.h, and there are ANSI versions for everything.

    The library really compiles with any compiler I have tested, including compilers without uint64_t like CompCert (CompCert does not compile Python for example, but for other reasons).

    (3) Are there parts of this library that provide functionality NOT in the current decimal library? If so, this should be at least documented, and perhaps either removed or exposed.

    Apart from mpsignal.c (see above), there are probably a couple of things in the header files like mpd_invroot(). _mpd_qinvroot() from mpdecimal.c *is* needed because the square root is calculated in terms of the inverse square root.

    Are these (probably) minor instances of additional functionality a big problem for you? Because for me it would be a hassle to maintain diverging versions of libmpdec and the Python version of libmpdec.

    This is also related to testing: The complete test suite (all tests against decNumber and decimal.py) under Valgrind takes 8 months to run.

    My question is: Where should I document these things and in what detail?

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    STINNER Victor \report@bugs.python.org\ wrote:

    How can I help to integrate this module into CPython?

    It would be fantastic if you could take a look at _decimal.c, for example to find some incompatibilities between _decimal.c and decimal.py.

    mpdecimal.c could also always profit from another audit.

    That's the only file that still needs to go through my second self-audit round.

    benjaminp commented 12 years ago

    The scripts for generating code would preferably go in a Tools/decimal directory.

    74c4563b-ab1c-43d8-9219-30c4eca796bc commented 12 years ago

    On Tue, Mar 6, 2012 at 3:07 PM, Stefan Krah

    Jim Jewett \report@bugs.python.org\ wrote: > (1)  I think this module would benefit greatly from a map explaining >      what each file does, and perhaps from some reorganization.

    Just MAP.txt in the top level directory?

    That should work. There may be better names, but I can't think of any just now.

    Amaury suggested moving the library into a subdirectory. I'm not sure about that. The library would be out of sight, but is that a good thing?

    cdecimal certainly needs a subdirectory similar to those for _io, _ctypes, _multiprocessing, and _sqlite.

    It *may* make sense to move some of the subdirectories around. (On the other hand, it may not; if the tests in Lib/test/ end up delegating back, that is probably OK.)

    I believe it would be helpful to move non-code (project files, etc) to separate directories.

    Whether you need *additional* subdirectories within _cdecimal to subcategorize the .c and .h files, I'm not sure -- because I didn't get in deep enough to know what they should be. If the categorization let people focus on the core, that would be helpful, but it wasn't clear to me which files were part of the exported API and which were implementation details. Are there are clear distinctions (type info/python bindings/basic arithmetic/advanced algorithms/internal-use-only/???)

    > As best I can yet tell, there are about \~130 files, over a dozen directories, > but the only ones that directly affect the implementation are a subset (~33) > of the *.c and *h files in Modules/_decimal/ (and not subdirectories).

    Indeed the top level directory contains _decimal.c and all files from libmpdec. Almost all files are required.

    Would it make sense to integrate only cdecimal, and to treat libmpdec as an external dependency that (usually) gets updated with each Python feature release, the way that sqlite is?

    The three subdirectories contain:

     tests/       ->  library tests  python/      ->  extended module tests

    I would really expect that to still be under tests, and I would expect a directory called python to contain code written in python, or at least python bindings.

     literature/  ->  pointers to papers and explanations for trickier algorithms.

    > Even files that do affect the implementation, such as mpdecimal.c, > also seem to have functions thrown in just for testing small pieces > of functionality, such as Newton Division.

    That is correct. They were deliberately added in that place because they rely on a couple of inline functions and I have a policy of testing the exact code that the original function relies on.

    How important is it that these functions be inline?

    Would it at least be OK to wrap them in stubs for exporting, so that the test logic could be places with the others tests? (I worry that some tests may stop getting run if someone else modifies the build process and doesn't notice the unusual location.)

    The alternative is to extract all functions needed, move them to the test directory and hope that the code doesn't get stale.

    I agree that copying is bad.

    I'll trust your judgement on the need for inline. But given:

    ALWAYS_INLINE int
    mpd_word_digits(mpd_uint_t word)

    I don't see anything wrong with exporting:

        int
        _testhelp_mpd_word_digits(mpd_uint_t word) {
            return mpd_word_digits(word);
        }

    > Turkish un/dotted-i problem when lowercasing -- but why is a decimal > library even worried about casing?

    "Infinity", "InFinItY", "iNF" are all allowed by the specification.

    OK; so is io.c part of the library, or part of the python binding?

    Given that this is targeted at 3.3 or later, would it make sense to either use casefolding, or check the kind? (If it isn't ASCII, it can't be the word "INF".)

    Are there only a certain number of strings that will ever matter, such as INF, NAN, and INFINITY, so that a case statement would work? tolower() with an extra check for the turkish undotted lower case i? What you have may well be the best compromise, but it bothers me to see text processing tools redone in a numeric type -- particularly without knowing why they are needed.

    > (2)  Is assembly allowed?

    I was under the assumption that it is allowed:

    I'm honestly not sure, but I think that was one of the reasons stackless was never integrated.

    > If not, please make it clear that vcdiv64.asm is just an optional speedup, > and that the code doesn't rely upon it.

    No code relies on asm. Assembly is only used for the double word mul/divmod primitives in typearith.h and the Pentium PRO modular multiplication in umodarith.h, and there are ANSI versions for everything.

    Good enough, though I would rather see that as a comment near the assembly.

    > (3) Are there parts of this library that provide functionality NOT >     in the current decimal library?  If so, this should be at least >     documented, and perhaps either removed or exposed.

    Apart from mpsignal.c (see above), there are probably a couple of things in the header files like mpd_invroot(). _mpd_qinvroot() from mpdecimal.c *is* needed because the square root is calculated in terms of the inverse square root.

    Are these (probably) minor instances of additional functionality a big problem for you? Because for me it would be a hassle to maintain diverging versions of libmpdec and the Python version of libmpdec.

    I'm not worried about the header files. I am worried about what is exposed to python, but just documenting it (docstrings and the module .rst) may be OK.

    But I'm also worried that there may be fair amounts of code that are effectively dead after the "remove any names not in decimal.py" importing trick. If so, I would at least like that in some sort of

    ifdef, so that people don't spend too much time trying to make sense

    of it.

    That said, if you plan to maintain an external libmpdec regardless of what happens, then it makes even more sense to integrate (at most) the bindings, and to treat libmpdec as an external dependency.

    benjaminp commented 12 years ago

    Speaking of inline, the "inline" keyword will have to go because it's not C89.

    918f67d7-4fec-4a8d-93e3-6530aeb1e57e commented 12 years ago

    But we could check if the compiler supports the inline keyword and use it if available.

    3807ddda-7a46-46c2-9a7e-24172362f2cc commented 12 years ago

    I've found some differences between decimal and cdecimal.

    cdecimal 2.3 does not support the __ceil and __floor methods that exist in decimal. math.ceil converts a cdecimal.Decimal instance into a float before finding the ceiling. This can generate incorrect results.

    >>> import decimal
    >>> import math
    >>> math.ceil(decimal.Decimal("12345678901234567890.1"))
    12345678901234567168

    The decimal module in previous versions returns the correct answer 12345678901234567891

    cdecimal.Decimal instances do not emulate the various single-underscore methods of a decimal.Decimal instance. In gmpy2, I use _int, _exp, _sign, and _is_special to convert a decimal.Decimal into an exact fraction. I realize the issue is with gmpy2 and I will fix gmpy2, but there may be other code that uses those methods.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

    Jim Jewett \report@bugs.python.org\ wrote:

    Whether you need *additional* subdirectories within _cdecimal to subcategorize the .c and .h files, I'm not sure -- because I didn't get in deep enough to know what they should be. If the categorization let people focus on the core, that would be helpful, but it wasn't clear to me which files were part of the exported API and which were implementation details. Are there are clear distinctions (type info/python bindings/basic arithmetic/advanced algorithms/internal-use-only/???)

    OK, as a basis for discussion I've added:

    http://hg.python.org/features/cdecimal/file/8b75c2825508/Modules/_decimal/FILEMAP.txt

    I didn't mention the main reason why _decimal.c and libmpdec are in a flat directory: Building the library first and then the module from the library led to problems on at least Windows and AIX. That's why I started to treat all libmpdec files as part of the module, list them as dependencies in setup.py and let distutils figure everything out. Distutils also can figure out automatically if a Mac OS build happens to be a "universal" build and things like that.

    The build process is very well tested by now and it took quite a while to figure everything out, so I'd be reluctant to change the flat hierarchy.

    > ??python/ ?? ?? ??-> ??extended module tests

    I would really expect that to still be under tests, and I would expect a directory called python to contain code written in python, or at least python bindings.

    Could you explain? The python/ directory contains deccheck.py, formathelper.py etc.

    Would it at least be OK to wrap them in stubs for exporting, so that the test logic could be places with the others tests? (I worry that some tests may stop getting run if someone else modifies the build process and doesn't notice the unusual location.)

    tests/runtest.c won't compile then. I'll look into the stub and also the _testhelp suggestions.

    > "Infinity", "InFinItY", "iNF" are all allowed by the specification.

    OK; so is io.c part of the library, or part of the python binding?

    I see a potential source of confusion: io.c is firmly part of the library. All PEP-3101 formatting is part of libmpdec, because I like the mini language. io.c only understands ASCII and UTF-8 fill characters.

    It is the *library* tests that would fail under the Turkish locale (if not for _mpd_strneq).

    Good enough, though I would rather see that as a comment near the assembly.

    Comments how to enforce an ANSI build (much slower!) are in LIBTEST.txt and now also in FILEMAP.txt.

    I'm not worried about the header files. I am worried about what is exposed to python, but just documenting it (docstrings and the module .rst) may be OK.

    But I'm also worried that there may be fair amounts of code that are effectively dead after the "remove any names not in decimal.py" importing trick. If so, I would at least like that in some sort of

    ifdef, so that people don't spend too much time trying to make sense

    of it.

    It's the opposite: names from decimal.py starting with an underscore that are not in _decimal are removed. If I don't use that trick, I end up with about 50 additional symbols from decimal.py:

    import decimal # the C version
    dir(decimal)

    ... '_ContextManager', '_Infinity', '_Log10Memoize', ...

    malemburg commented 12 years ago

    Does the C version have a C API importable as capsule ? If not, could you add one and a decimal.h to go with it ?

    This makes integration in 3rd party modules a lot easier.

    Thanks, -- Marc-Andre Lemburg eGenix.com


    2012-02-13: Released eGenix pyOpenSSL 0.13 http://egenix.com/go26 2012-02-09: Released mxODBC.Zope.DA 2.0.2 http://egenix.com/go25

    ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

    eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/