python / cpython

The Python programming language
https://www.python.org
Other
63.1k stars 30.22k forks source link

empty local-part in addr_spec displayed incorrectly #82413

Open 4cc662bf-6f27-4ca1-992d-2453203a8f67 opened 5 years ago

4cc662bf-6f27-4ca1-992d-2453203a8f67 commented 5 years ago
BPO 38232
Nosy @warsaw, @bitdancer, @maxking, @andreitroiebbc
Files
  • example_parser.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-bug', '3.7', 'expert-email', '3.9'] title = 'empty local-part in addr_spec displayed incorrectly' updated_at = user = 'https://github.com/andreitroiebbc' ``` bugs.python.org fields: ```python activity = actor = 'maxking' assignee = 'none' closed = False closed_date = None closer = None components = ['email'] creation = creator = 'andreitroiebbc' dependencies = [] files = ['48617'] hgrepos = [] issue_num = 38232 keywords = [] message_count = 3.0 messages = ['352852', '353003', '353983'] nosy_count = 4.0 nosy_names = ['barry', 'r.david.murray', 'maxking', 'andreitroiebbc'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue38232' versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9'] ```

    4cc662bf-6f27-4ca1-992d-2453203a8f67 commented 5 years ago

    Given an (RFC-legal) email address with the local part consisting of a quoted empty string (e.g. 'Nobody \""@example.org\'), when I call the 'addr_spec' property, the result no longer includes the quoted empty string (so, in this case, addr_spec would return '@example.org').

    4cc662bf-6f27-4ca1-992d-2453203a8f67 commented 5 years ago

    As far as I understand it, this is due to the following code in email.headerregistry.Address.addr_spec (in 3.8 and below):

    if len(nameset) > len(nameset-parser.DOT_ATOM_ENDS):
        lp = parser.quote_string(self.username)

    or, in the current version on master:

    lp = self.username
    if not parser.DOT_ATOM_ENDS.isdisjoint(lp):
        lp = parser.quote_string(lp)

    Both of these tests will not work with the empty string since the empty string is always disjoint from anything, so it will never get quoted.

    maxking commented 5 years ago

    It is actually parsed correctly and serialized back when you try to convert it to a string representation:

    from email.parser import BytesFeedParser
    import email.policy
    
    def main():
        eml_string = 'From: Nobody <""@example.org>'
        parser = BytesFeedParser(policy = email.policy.default)
        parser.feed(eml_string.encode())
        msg = parser.close()
        print(msg.get('From').addresses[0].addr_spec)
        print(repr(msg.get('From')._parse_tree))
        print(msg.as_string())

    Running this gives me:

    @example.org AddressList([Address([Mailbox([NameAddr([DisplayName([Atom([ValueTerminal('Nobody'), CFWSList([WhiteSpaceTerminal(' ')])])]), AngleAddr([ValueTerminal('\<'), AddrSpec([LocalPart([QuotedString([BareQuotedString([ValueTerminal('')])])]), ValueTerminal('@'), Domain([DotAtom([DotAtomText([ValueTerminal('example'), ValueTerminal('.'), ValueTerminal('org')])])])]), ValueTerminal('>')])])])])]) From: Nobody \""@example.org\

    Notice the : AddrSpec([LocalPart([QuotedString([BareQuotedString([ValueTerminal('')])])])

    print() converts the addr-spec into a string, which omits the quotes. This is true for any non-none string too:

    hello@example.org AddressList([Address([Mailbox([NameAddr([DisplayName([Atom([ValueTerminal('Nobody'), CFWSList([WhiteSpaceTerminal(' ')])])]), AngleAddr([ValueTerminal('\<'), AddrSpec([LocalPart([QuotedString([BareQuotedString([ValueTerminal('hello')])])]), ValueTerminal('@'), Domain([DotAtom([DotAtomText([ValueTerminal('example'), ValueTerminal('.'), ValueTerminal('org')])])])]), ValueTerminal('>')])])])])]) From: Nobody \"hello"@example.org\

    If you prefer the string representation of the header's parsed value, you can try:

    print(msg.get('From').fold(policy=email.policy.default))

    Which prints:

    From: Nobody <""@example.org>