Open c5d4bb9b-17bb-47bb-a3cc-ef774deba4a9 opened 8 years ago
This code:
import textwrap
textwrap.wrap("123 123 1234567", width=5)
currently* produces this output:
['123', '123 1', '23456', '7']
I would expect the textwrap module to only break words when absolutely necessary. That is, I would have expected it to produce one break less:
['123', '123', '12345', '67']
This is of course a matter of taste - the current implementation produces more efficiently filled lines.
(* I only have access to Python 2.7 and 3.4)
You can do this already with the break_long_words arg of testwrap:
>>> import itertools, textwrap
>>> wr = textwrap.wrap
>>> list(itertools.chain(*(wr(x, 5) for x in wr("123 123 1234567", width=5, break_long_words=False))))
['123', '123', '12345', '67']
The code with nested wraps is awesome. But it does not work well.
>>> list(itertools.chain(*(wr(x, 5) for x in wr("123 123 1234567 12", width=5, break_long_words=False))))
['123', '123', '12345', '67', '12']
It is expected that '67' and '12' should be in the same line: '67 12'.
One more wrap:
>>> wr(' '.join(itertools.chain(*(wr(x, 5) for x in wr("123 123 1234567 12", width=5, break_long_words=False)))), 5)
['123', '123', '12345', '67 12']
To clarify, this solution is a linear-time greedy one, with three passes:
This minimizes the number of breaks within words. It doesn't minimize the number of output lines (you'd need a dynamic programming programming algo for that - O(n^2)). So for this input:
wr("123 12 123456 1234", 5)
you will get ['123', '12', '12345', '6', '1234']
where you may (or may not) have preferred:
['123', '12 1', '23456', '1234']
It may be worth fixing wrap() to do the nicer style of wrapping for long words. If we decide to do that, it should be done via a new parameter because the same logic (TextWrapper class) is used for shorten
and in that case it may be preferable to have the chunk of longer word rather than cutting it out entirely.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-feature', 'library', '3.9', '3.10', '3.11']
title = 'textwrap should minimize number of breaks in extra long words'
updated_at =
user = 'https://bugs.python.org/TuomasSalo'
```
bugs.python.org fields:
```python
activity =
actor = 'andrei.avk'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'Tuomas Salo'
dependencies = []
files = []
hgrepos = []
issue_num = 26214
keywords = []
message_count = 6.0
messages = ['258999', '376368', '376387', '376388', '376395', '409241']
nosy_count = 6.0
nosy_names = ['steven.daprano', 'serhiy.storchaka', 'Tuomas Salo', 'xtreak', 'iritkatriel', 'andrei.avk']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue26214'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']
```