python / cpython

The Python programming language
https://www.python.org/
Other
60.87k stars 29.39k forks source link

IMAP library encoding enhancement #68218

Open 0fc4e5d9-5101-4089-b85b-34f3fc5ef334 opened 9 years ago

0fc4e5d9-5101-4089-b85b-34f3fc5ef334 commented 9 years ago
BPO 24030
Nosy @warsaw, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'library', 'expert-email'] title = 'IMAP library encoding enhancement' updated_at = user = 'https://bugs.python.org/pmoleri' ``` bugs.python.org fields: ```python activity = actor = 'pmoleri' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)', 'email'] creation = creator = 'pmoleri' dependencies = [] files = [] hgrepos = [] issue_num = 24030 keywords = [] message_count = 1.0 messages = ['241821'] nosy_count = 3.0 nosy_names = ['barry', 'r.david.murray', 'pmoleri'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue24030' versions = ['Python 3.6'] ```

0fc4e5d9-5101-4089-b85b-34f3fc5ef334 commented 9 years ago

IMAP library doesn't encode parameters to the required charset. The library is useful, but when it comes to complex mailbox names, the user needs to convert strings to and from the custom imap_utf7 encoding. I think this conversion could be implemented in the library and applied transparently to all the arguments that need it.

Example: IMAP4.select(mailbox='INBOX', readonly=False): For the method to work, the mailbox argument needs to be encoded to imap_utf7 and if it has spaces it needs to be double quoted. All this hassle could be handled by the library.

The same applies to every function that uses a mailbox or directory argument.

When it comes to the mailbox argument I can identify the following cases: a. bytes: It should be treated as an already imap_utf7 encoded string. If necessary it can be converted to string using ascii charset. b. string: b.1: It's a valid imap_utf7 string without '&' -> doesn't need encoding. Eg.: INBOX b.2: An already encoded imap_utf7 string with '&' character -> doesn't need encoding. Eg.: Test&AOk- b.3: Any other case (invalid imap_utf7 string) -> needs to be encoded

Proposal:

  1. Impelement an imap_utf7_encode() method
  2. Implement a strict imap_utf7_decode() method, it must return an error if the input doesn't conform to imap_utf7 encoding.
  3. Implement a method to ensure arguments are in imap_utf7 encoding:
    • bytes -> arg.decode('ascii')
    • string && imap_utf7_decode(arg) -> arg
    • otherwise -> imap_utf7_encode(arg)
    • In every case if it has spaces double quote the whole string
  4. In every method that receives a mailbox or directory argument call this new method to ensure it's imap_utf7 encoded.
vadmium commented 2 months ago

I’m against these kind of encoding mechanisms that are transparent for a limited set of inputs. It doesn’t add any benefit if the caller needs to handle arbitrary unencoded mailbox names that might look like valid UTF-7 encodings.

Is the UTF-7 encoding mandatory for IMAP? The RFCs just seem to claim it is a convention. Is it legal to have an unencoded ampersand (&) in an IMAP mailbox or directory name? If so, encoding it would change valid behaviour and break backwards compatibility.

See Issue #49555 proposing separate encoding and decoding functions, and Issue #92835 for restoring the documented behaviour that double-quotes arguments.