python / cpython

The Python programming language
https://www.python.org
Other
62.42k stars 29.97k forks source link

str.center inconsistent with format "^" #67812

Closed fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 closed 9 years ago

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 9 years ago
BPO 23624
Nosy @rhettinger, @ericvsmith, @bitdancer, @serhiy-storchaka, @vedgar

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['type-bug'] title = 'str.center inconsistent with format "^"' updated_at = user = 'https://github.com/vedgar' ``` bugs.python.org fields: ```python activity = actor = 'eric.smith' assignee = 'rhettinger' closed = True closed_date = closer = 'rhettinger' components = [] creation = creator = 'veky' dependencies = [] files = [] hgrepos = [] issue_num = 23624 keywords = [] message_count = 14.0 messages = ['237729', '237732', '237734', '237736', '237738', '237739', '237748', '237752', '238160', '238204', '238213', '238263', '238431', '238444'] nosy_count = 5.0 nosy_names = ['rhettinger', 'eric.smith', 'r.david.murray', 'serhiy.storchaka', 'veky'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = None status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue23624' versions = ['Python 3.5'] ```

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 9 years ago

Look at this:

    >>> format("ab", "^3"), "ab".center(3)
    ('ab ', ' ab')
    >>> format("abc", "^4"), "abc".center(4)
    ('abc ', 'abc ')

I'm sure you agree this is inconsistent. As far as I can see, format relies on // (flooring) behaviour when dividing remaining space by 2, while .center relies on round (ROUND_HALF_EVEN) behaviour. One of them should be changed to match the other.

As far as I can see, documentation (in neither case) says nothing about how _exactly_ strings are centered when there is an odd number of fill characters, so I would interpret this as: we can change either. (My preference is for .center to be changed, but this is not so important. Consistency is important.)

rhettinger commented 9 years ago

FWIW, I prefer the approach which always puts the odd pad character on the right. I believe this is what most typists would do. Ideally, the behavior should be the same as in Python 2 (before ROOUND-HALF-EVEN) so that templates don't change their appearance when moving to Python 3.

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 9 years ago

If I see right, it seems that that (putting odd pad character on the right) is exactly what format does (I don't have Py2 here to check how exactly it behaves). So, the current proposition to change the .center to match format stays.

(I think I can see how the implementor of .center reasoned: same as people who invented banker's rounding, they wanted to "balance out" odd padding chars among the lines, so that the ammortized number of them on the left and right is the same. But I think consistency matters more here. Noone is summing the spaces on both sides and balancing these numbers, center is primarily for visual purposes, not arithmetical.)

ericvsmith commented 9 years ago

I agree it would be nice to be consistent, and that str.__format__'s approach is likely the desirable one.

But is it worth the breakage? Surely someone cares about the current behavior. Does this cause any practical, real-world problem?

serhiy-storchaka commented 9 years ago

The behavior of Python 2.7 is the same as Python 3.x.

str.center() is very old method and changing it will change formatted reports generated by third-party software in all world. At least this will break many tests. And this will add yet one incompatibility between Python 2 and Python 3.

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 9 years ago

By that line of reasoning, we would probably never change anything. :-( I searched the documentation, and the exact behaviour isn't documented anywhere. In fact all examples are of the kind where there is an even number of fill chars. I thought it was ok to "break" undocumented things.

As for a practical problem, here is how I encountered it. I had a format that centered something inside a constant-width column. format("^79", title) it was, I think. Then, as it usually happens, things were generalized and the width was customizable. Of course, I could have used "{:^{}}".format(title, width), but title.center(width) really seemed like a better option.

Imagine my horror when a test failed. At first I thought, ok, format puts extra space on one side, .center at the other, no big deal. But then I changed it, and it failed again. Many time and lost nerves later (not something I usually associate with Python), I realized that those two use different _methods_ of calculating the distribution of spaces.

At that moment, I really thought it must be a bug. But I went to the docs, and they never said anything about where an extra fill char is put. So yes, it is _theoretically_ possible that this is exactly the intended behaviour. But I really think it's not the case.

serhiy-storchaka commented 9 years ago

The problem is that it is not *obvious* what method is more correct.

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 9 years ago

We can make a poll, I think it is quite obvious. But that's not really the point.

I would accept the solution enabling _both methods of rounding with _both methods of formatting (it isn't even very hard to do: .center can take additional argument, mandatory keyword if necessary, and format can use "=" for .center centering, or if you're worried about confusion with numeric types, invent another symbol - or even use "^" with alternative format "#").

But I can't accept that "there is no obviously correct method" be a reason for .center and format doing it in two different ways. I'm convinced nobody switches from format to .center _because_ they want another method of distributing fills. They switch, like I did, because a constand-width field has become variable-width, and nested formats are ugly - and then have to deal with things breaking, writing 4 lines where previously there was only 1, checking remainders by 2 here and there, and having a hard time generally. And never being sure if it's correct, since the real behaviour is undocumented. :-(

rhettinger commented 9 years ago

[Serhiy]

The behavior of Python 2.7 is the same as Python 3.x.

str.center() is very old method and changing it will change formatted reports generated by third-party software in all world. At least this will break many tests. And this will add yet one incompatibility between Python 2 and Python 3.

I concur with Serhiy. Fixing this nuisance inconsistency would likely cause greater harm than it would relieve. This ship has sailed.

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 9 years ago

Ok, let's say I see your point about breaking existing code (though, I reiterate, it's _undocumented_ and even examples in the docs all cover only the obvious case).

Is it _at least_ possible to give .center an (keyword-only if needed) argument that specifies the rounding method? Default will of course be "ROUND_HALF_EVEN", but it should be possible at least to specify "ROUND_FLOOR" to be consistent with format (maybe other rounding method should be supported, but it's not important).

This will not break anything, and it will be a tremendous help to those who want to switch from one method to the other (for example, once the field width becomes variable)? I suggest changing .center instead of format because a) it's easier and more Pythonic (keyword-only arguments are accepted method of dealing exactly with that kind of problem), b) it's more likely wanted to be changed, since format method is more "obvious" to many people, c) I guess more people switch in that direction, and d) format doesn't really have a nice hook in the format string to customize the rounding.

I sincerely hope you'll agree at least to this. Thank you.

bitdancer commented 9 years ago

Backward compatibility does not mean "matches the docs", it means "if we change this, someone's existing code is likely to break". That does not prevent us from making at-the-margin changes in a feature release, but it is still a bar to be overcome in making such a change, and the consensus here is that it doesn't meet that bar.

Your primary argument seems to be the desire to make it easy to switch to center if the width of the field becomes a variable. But you note yourself that you do not need to do this; format is quite capable of handing a variable width field.

So, there also does not seem to be sufficient motivation to add this feature. If you wish to pursue this further, posting your proposal for a keyword for center and the motivation to python-ideas would be the way to proceed. If you gather a consensus of support there, you can propose a new enhancement request issue with a link to the python-ideas discussion. (Or, in the unlikely case that the consensus is that center's algorithm should change, we can reopen this issue.)

fe5a23f9-4d47-49f8-9fb5-d6fbad5d9e38 commented 9 years ago

Only one detail to resolve: you say "format is quite capable of handing a variable width field".

It does, but not in a really nice way. Honestly, would you really rather see

"{:^{}}".format(title, width)

than

    title.center(width)

? Nested formats are ugly, I think that's a consensus. :-) Not to mention that .center is more explicit about what's going on - that "^" is almost lost in that brace-forest.

bitdancer commented 9 years ago

I would write it:

"{:^{width}}".format(title, width=width)

By the way, in case it isn't clear, we acknowledge that in a perfect world center would work "right", but our judgment is that the pain of fixing it outweighs the benefit. Our collective consensus on that could be shifted, but someone would have to be willing to argue it convincingly to the community (ie: not on the bug tracker).

ericvsmith commented 9 years ago

I agree with David: It's unfortunate, but too late to change.

FWIW, since the goal is to replace just str.center, I'd write this as:

def my_center(s, width):
    return format(s, '^' + str(width))

As soon as you're mixing in other formatting, then str.center couldn't have been used, anyway, so no sense bringing all of str.format to bear on the problem.