Open 7be85f3c-f464-4458-a888-de7b31a18e30 opened 2 years ago
iso-8859-6-i, iso-8859-6-e, iso-8859-8-i and iso-8859-8-i are all IANA recognized character sets per https://www.iana.org/assignments/character-sets/character-sets.xhtml. These are all unrecognized by codecs.lookup().
Even though these are IANA recognized encodings, we need to apply he same logic as we do for all new encodings, which essentially boils down to: Are these encoding in wider spread use today ?
Reading through the RFC 1556, it seems that the added -i or -e are just indications for applications on how to interpret BIDI information: either implicit by looking at the order of characters in the stream or explicit via control characters embedded in the stream. They are not new encodings, with new mappings.
If that's a correct interpretation, we could add those as aliases for the non-annotated encodings.
After more than 20 years with Unicode support in Python and the world moving towards UTF-8, I have become fairly reluctant towards adding more encoding support to Python.
If people are still using unsupported encodings, it's probably better to point them to other dedicated tools for converting text to UTF-8, e.g. iconv, than extending the pretty extensive support we already have in Python.
The Mailman-users@python.org list received a post with the From: header containing a Hebrew display name RFC 2047 encoded with the iso-8859-8-i charset which threw a LookupError: unknown encoding: iso-8859-8-i exception in processing and shunted the message. The message body also had the charset declared as iso-8859-8-i although it contained only ascii. Unfortunately, I don't have the original message so I can't say what MUA created it or how common this usage is.
I do think that just adding these as aliases for the non-annotated encodings is an appropriate response.
Even though these are IANA recognized encodings, we need to apply he same logic as we do for all new encodings, which essentially boils down to: Are these encoding in wider spread use today ?
Reading through the RFC 1556, it seems that the added -i or -e are just indications for applications on how to interpret BIDI information: either implicit by looking at the order of characters in the stream or explicit via control characters embedded in the stream. They are not new encodings, with new mappings.
If that's a correct interpretation, we could add those as aliases for the non-annotated encodings.
After more than 20 years with Unicode support in Python and the world moving towards UTF-8, I have become fairly reluctant towards adding more encoding support to Python.
If people are still using unsupported encodings, it's probably better to point them to other dedicated tools for converting text to UTF-8, e.g. iconv, than extending the pretty extensive support we already have in Python.
Reviving an old issue, however still one that affects some of us: Deferring to "if people are still using unsupported encodings..." is not entirely a fair point, IMHO: Granted we all prefer to move forward with the times, and probably constantly strive to do so, however regretfully we still have to deal with (the output of) various legacy systems still in use (mostly by archaic, large, and technologically inert organizations; mostly windows-based applications). Having our systems crash instantly when encountering these encodings cripples us. And there is no one to address on the side of the out-of-date systems to possibly fix this.
Like @msapiro , I believe that adding these as aliases is a minimal update, which would greatly increase the range of Python apps people can use within (and in the vicinity of) the Hebrew-speaking community.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug', 'library', '3.10', '3.11']
title = "codecs module doesn't support iso-8859-6-i, iso-8859-6-e, iso-8859-8-i or iso-8859-8-i"
updated_at =
user = 'https://github.com/msapiro'
```
bugs.python.org fields:
```python
activity =
actor = 'msapiro'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'msapiro'
dependencies = []
files = []
hgrepos = []
issue_num = 45921
keywords = []
message_count = 3.0
messages = ['407240', '407262', '407305']
nosy_count = 2.0
nosy_names = ['lemburg', 'msapiro']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue45921'
versions = ['Python 3.10', 'Python 3.11']
```