python / cpython

The Python programming language
https://www.python.org/
Other
60.02k stars 29.05k forks source link

Fix for 110609: Allow large long integers in formatting #33135

Closed 61337411-43fc-4a9c-b8d5-4060aede66d0 closed 23 years ago

61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago
BPO 401544
Nosy @gvanrossum, @tim-one, @loewis
Files
  • None: None
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core'] title = 'Fix for 110609: Allow large long integers in formatting' updated_at = user = 'https://github.com/loewis' ``` bugs.python.org fields: ```python activity = actor = 'tim.peters' assignee = 'none' closed = True closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'loewis' dependencies = [] files = ['2815'] hgrepos = [] issue_num = 401544 keywords = ['patch'] message_count = 7.0 messages = ['34372', '34373', '34374', '34375', '34376', '34377', '34378'] nosy_count = 3.0 nosy_names = ['gvanrossum', 'tim.peters', 'loewis'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue401544' versions = [] ```

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago
    gvanrossum commented 23 years ago

    For Tim, because he likes numerical tweakage. (Tim, you may also give this to Marc-Andre, since it affects Unicode.)

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago

    It appears that I forgot the snippet

    diff -u -r2.23 stringobject.h
    --- stringobject.h      2000/09/19 20:58:38     2.23
    +++ stringobject.h      2000/09/20 07:32:28
    @@ -68,6 +68,10 @@
     #define PyString_InternFromString(cp) PyString_FromString(cp)
     #endif
    
    +/* For use in unicodeobject only. */
    +PyObject* Py_formatlong(PyObject *val, int flags, int prec, int type, 
    +                       char**pbuf, int *plen);
    +
     /* Macro, trading safety for speed */
     #define PyString_AS_STRING(op) (((PyStringObject *)(op))->ob_sval)
     #define PyString_GET_SIZE(op)  (((PyStringObject *)(op))->ob_size)

    IOW, Py_formatlong is not intended as an external API. It must have a prefixed name, though, as it is intended to be used across translation units.

    As for 0 vs NULL, ISO C99 says "An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant." It says that NULL is an implementation-defined null pointer constant, it does not say that either is better than the other. So I feel usage of 0 is perfectly fine in C, with the same rationale that encourages it in C++ over usage of NULL.

    I'm sorry for the other problems that patch caused.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 23 years ago

    It also appears that the last chunk of test_format is incorrect, 10 octal is 12, and 10 decimal is 10...

    tim-one commented 23 years ago

    I'm afraid there are a lot of low-level problems with this patch. Have spent about 90 minutes fixing problems so far, and think it will take another 90 before it's done. Since I'm in the middle of it now, I'd rather finish it than start over from scratch with another stab (and I *do want to see this get in -- it's a frequent complaint on c.l.py, and is sure *perceived as a bug today).

    Examples include:

    + There was no prototype in scope for stringobject.c's new Py_formatlong when it was referenced in unicodeobject.c. This caused the compiler to assume it returned int instead of PyObject*, and that caused bad code generation.

    + "Py_formatlong" isn't a proper name for an internal function; if it was meant to be part of the public API, it needed docs.

    + The new formatlong in unicodeobject.c should have been declared static.

    + Ubiquitous use of 0 instead of NULL for null pointers; but this is C, not C++.

    + Non-Guido-like code formatting in small ways.

    + The logic for "skip" and "sign" is hosed in some cases. For example,

    >>> "%20x" % 2L**31
    '            80000000'
    >>> "%.20x" % 2L**31
    '8000000000000000000'
    >>>

    This because the first "prec > len" loop copies skip+sign characters from the start of buf regardless of whether F_ALT is set (in the 2nd case above, F_ALT is not set, so it was incorrect to copy 2 characters before zero-filling).

    + The changed test_format failed for me; haven't tracked down why yet.

    + Not your doing: stringobject.c's existing formatint uses a too-small buffer, under the bad assumption that a C long is at most 2**31. (Speaking of which, why do people constantly try to get a temp buffer down to as few bytes as possible?! The difference between 20 and (say) 256 in a leaf routine is meaningless.)

    That last one was just to reassure you that I didn't spend the *whole* 90 minutes so far just picking on this patch \<wink>.

    tim-one commented 23 years ago

    Martin, these aren't deep issues, This is simply a matter of playing along with the code all around you. The convention for private interface names is to begin them with an underscore, as is plain just from looking at stringobject.c alone. So I renamed the function to _PyString_FormatLong, so it looks like everything else. Same bit with NULL vs 0: when the code you're changing uses NULL *everywhere*, using 0 instead for your own little pieces is a bad idea, and for exactly the same reasons a mixture of indentation or brace styles is a bad idea. There's nothing that can be argued about here.

    As for the rest, I think you underestimated the intricacy of C's format codes. As other examples here, the blank, plus and zero flags are getting ignored now when formatting a long. This isn't trivial to fix, as the code you built on believed to the depths of its soul that x, X and o conversions can't yield a string with a sign character.

    So I'm fixing that stuff too. It's all doable, just tedious (I admire your courage in daring to touch any of this code at all \<0.9 wink>).

    I agree with your analysis of the test_format failures, and have fixed them. No offense intended, but it's more evidence that you were in an un-Martin-like careless mood when you created this patch (test_format could not have worked for you either).

    tim-one commented 23 years ago

    Rejected this specific patch for the reasons given, but am checking in a derivative that I hope gets all the endcases right; also added dozens of non-trivial new test cases to test_format.py.