Fixes of ReadStream.readline() in UTF-16 and -LE codecs

50eff062-408a-4098-b1b2-8222303b9d0c commented 23 years ago

BPO	401477
Nosy	@malemburg, @gvanrossum, @freddrake
Files	None: None

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/malemburg' closed_at = created_at = labels = ['library'] title = 'Fixes of ReadStream.readline() in UTF-16 and -LE codecs' updated_at = user = 'https://bugs.python.org/anonymous' ``` bugs.python.org fields: ```python activity = actor = 'gvanrossum' assignee = 'lemburg' closed = True closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'anonymous' dependencies = [] files = ['2792'] hgrepos = [] issue_num = 401477 keywords = ['patch'] message_count = 5.0 messages = ['34282', '34283', '34284', '34285', '34286'] nosy_count = 3.0 nosy_names = ['lemburg', 'gvanrossum', 'fdrake'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue401477' versions = [] ```

3772858d-27d8-44b0-a664-d68674859f36 commented 23 years ago

freddrake commented 23 years ago

Marc-Andre, please review this & decide what should happen next.

gvanrossum commented 23 years ago

This version of the patch is clearly bogus. In UTF-16 encodings, \n can occur whenever the low or high byte of a Unicode character is 0x0A. I don't know if Unicode is designed to avoid all such code positions but I can hardly believe it.

A correct readline() method would have to read 2 bytes at a time and check for u"\u000A". (I don't care for all the other Unicode line breaking characters, those are for a different application level presumably.)

malemburg commented 23 years ago

I'm not sure whether this is the right fix: Unicode defines many more line break characters than just LF and the patch will only work correctly on Unix (also note that UTF-16 can be BE and LE -- your fix assumes LE).

A true fix would have to also touch the .read() method and implement a true read-ahead buffer strategy to get this done right.

malemburg commented 23 years ago

Postponed until after the Python 2.0b2 release.

python / cpython

Fixes of ReadStream.readline() in UTF-16 and -LE codecs #33086