python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

email.feedparser: support RFC 6532 section 3.5 #88826

Open 43ebcf11-8e1c-4298-ad23-c8d33828e356 opened 3 years ago

43ebcf11-8e1c-4298-ad23-c8d33828e356 commented 3 years ago
BPO 44660
Nosy @warsaw, @terryjreedy, @bitdancer, @f18a14c09s
PRs
  • python/cpython#27208
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'expert-email', '3.11'] title = 'email.feedparser: support RFC 6532 section 3.5' updated_at = user = 'https://github.com/f18a14c09s' ``` bugs.python.org fields: ```python activity = actor = 'f18a14c09s' assignee = 'none' closed = False closed_date = None closer = None components = ['email'] creation = creator = 'f18a14c09s' dependencies = [] files = [] hgrepos = [] issue_num = 44660 keywords = ['patch'] message_count = 9.0 messages = ['397693', '398094', '398100', '398101', '398103', '398107', '398108', '398773', '416533'] nosy_count = 6.0 nosy_names = ['barry', 'terry.reedy', 'r.david.murray', 'python-dev', 'virusdetected1337', 'f18a14c09s'] pr_nums = ['27208'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue44660' versions = ['Python 3.11'] ```

    43ebcf11-8e1c-4298-ad23-c8d33828e356 commented 3 years ago

    Note that I have created a fix for this and am working through the Python Developer’s Guide to propose it.

    terryjreedy commented 3 years ago

    RFC 6532 Internationalized Email Headers https://datatracker.ietf.org/doc/html/rfc6532

    3.5. Changes to MIME Message Type Encoding Restrictions https://datatracker.ietf.org/doc/html/rfc6532#section-3.5

    don't obviously "message/global Emails with non-identity Content-Transfer-Encodings". Please clarify.

    terryjreedy commented 3 years ago

    dont obviously *match* ....

    43ebcf11-8e1c-4298-ad23-c8d33828e356 commented 3 years ago

    Try parsing an example such as this through the Python email library:

    Content-Type: message/global Content-Transfer-Encoding: quoted-printable

    Content-Type: text/plain; = charset=3D”us-ascii”

    Hello, World!

    You will find that the valid quoted-printable content is visited but not decoded prior to parsing; it is parsed as-is and treated as a bad message: the charset in the “text/plain” Content-Type improperly parsed, for example. I have a draft pull request for this, which might be easier to understand. Let me know if you need further clarification.

    092899a0-5c8a-4cdb-8b16-eac88d5f2d41 commented 3 years ago

    bpo-44713: subprocess.rst typo "shell=True" => shell=True

    43ebcf11-8e1c-4298-ad23-c8d33828e356 commented 3 years ago

    The first paragraph of section 3.5 states two positions that the RFC holds on Content-Transfer-Encodings: (1) “allows newly defined MIME types to permit content-transfer-encoding;” and (2) “allows content-transfer-encoding for message/global (see Section 3.7) [sic].”

    The second position refers to and combines with section 3.7 (namely the “Encoding considerations” paragraph) to support my interpretation that implementations should accurately parse "message/global Emails with non-identity Content-Transfer-Encodings” (how I have paraphrased it). For quick reference, “non-identity” refers to the “quoted-printable” and “base64” Content-Transfer-Encodings. The first position “suggests” a wider breadth, but I do not think its words “necessitate” it. For, the RFC lists only one “newly defined MIME type;” whereas a media/MIME type in general (in this case, all others) only affects the scope of RFCs that define/update it.

    So I think my interpretation is precise if one defers to section 3 of the RFC.

    Now, admittedly, two other sections in the RFC seem to contradict section 3.5 by broadening the scope (Abstract, p. 2; Introduction, s. 1, p. 3). I’m not super hung up on how to resolve the contradiction; but I think, at a minimum, the missing support for “message/global” should be added.

    bitdancer commented 3 years ago

    Having looked at the cited part of the RFC (but not tried to analyze it in detail), I think you are correct. I've also glanced at your PR, and I think your approach is correct in broad outline, but I haven't looked at the details. For full message/global support, however, it will also be necessary to look at the output side: given a message/global part, a transfer encoding should be applied when serializing with cte_type=7bit. Support for message/global should also be added to the contentmanager.

    I won't have an objection if this is accepted with only the feedparser support, but I would recommend that the remaining pieces of support for message/global be added before the feature is released.

    43ebcf11-8e1c-4298-ad23-c8d33828e356 commented 3 years ago

    As requested, I’ve started testing and writing changes for the generator and content manager. I’m assuming there is no rush since the scope is now widened. Will have some follow-on questions about the design.

    43ebcf11-8e1c-4298-ad23-c8d33828e356 commented 2 years ago

    Hello,

    I haven't had the time to complete @r.david.murrary's recommendation and unfortunately don't anticipate that I will for now. Any objection to me resubmitting the pull request as-is? Alone, it still delivers business value so to speak.

    Thanks