python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.4k forks source link

mbox From line wrongly detected #81538

Closed 6a8a08c7-8292-45e6-a476-33c00a9e4342 closed 5 years ago

6a8a08c7-8292-45e6-a476-33c00a9e4342 commented 5 years ago
BPO 37357
Nosy @warsaw, @bitdancer, @maxking

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.7', 'invalid', 'type-bug', 'library', 'expert-email'] title = 'mbox From line wrongly detected' updated_at = user = 'https://bugs.python.org/Andro' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = True closed_date = closer = 'eric.smith' components = ['Library (Lib)', 'email'] creation = creator = 'Andro' dependencies = [] files = [] hgrepos = [] issue_num = 37357 keywords = [] message_count = 3.0 messages = ['346192', '346201', '347591'] nosy_count = 4.0 nosy_names = ['barry', 'r.david.murray', 'maxking', 'Andro'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue37357' versions = ['Python 3.7'] ```

6a8a08c7-8292-45e6-a476-33c00a9e4342 commented 5 years ago

When parsing an mbox file, the Python mailbox library is confused by the presence of lines starting with 'From' in the body of the text. A new fragmentary message item is created, but this is wrong. The following sample code and input demonstrates this. Replacing 'From' in the message body with, say, ' From' results in correct parsing.

This defect prevents correct import of mbox files into hyperkitty for GNU Mailman 3, as one instance where this is an impediment, as the message items become corrupt.

-- Python code import sys import mailbox

def main():
    print('mailbox read test')
    mbox = mailbox.mbox(sys.argv[1])
    for msg in mbox:
        print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~')
        print(msg)
        print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~')

if __name__ == "__main__":
    main()

--- sample mbox with one message

From Fred Nurk \fred.nurks@nowhere.org\ Wed, 8 Dec 1999 14:45:02 -0400 Date: Wed, 8 Dec 1999 14:45:02 -0400 From: Fred Nurk \fred.nurk@inowhere.org\ Subject: Testing mbox in Python

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce semper tempus augue at consectetur. Morbi eu nunc magna. Nulla placerat, eros in mollis finibus, dui risus ultrices tortor, non tincidunt nibh odio at augue. Quisque quis mauris neque. Curabitur ac accumsan neque. Maecenas sed mauris non justo sagittis finibus vel vel ex. Maecenas quis rutrum libero. Curabitur ex ante, tincidunt in velit at, egestas lobortis quam. Praesent tempus at dui ut volutpat. Nullam in rhoncus massa, id malesuada tortor. Suspendisse at cursus ex. Phasellus vitae pulvinar eros. Ut euismod dapibus libero, ultricies tempor leo accumsan ac. Etiam vestibulum, urna eget interdum eleifend, nulla nulla eleifend lacus, at lacinia neque nisi non velit.

From sed vehicula venenatis dui at ultricies. Pellentesque vehicula vulputate nibh nec aliquet. Vestibulum pretium velit id libero porttitor, sed facilisis metus fermentum. Donec vestibulum, sapien non convallis sodales, justo libero volutpat dui, ut luctus odio nisi eget sapien. In viverra libero gravida arcu euismod, non sollicitudin massa auctor. Pellentesque vitae laoreet nisi. In eros massa, pretium at condimentum eu, molestie ut tortor. Suspendisse faucibus felis sem, et fringilla urna consectetur molestie. Integer suscipit, orci sed convallis maximus, velit purus tempus dui, id egestas tortor erat auctor dui. Nulla fermentum tellus ut odio elementum, vel bibendum mi imperdiet. Proin sed auctor purus. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nullam non arcu ex. Duis dapibus nunc in urna dapibus, sit amet interdum lectus tincidunt.

Fred

--

6a8a08c7-8292-45e6-a476-33c00a9e4342 commented 5 years ago

Not really a bug. Results from problems with the loose mbix format and lack of standards. Nothing Python can do about it.

bitdancer commented 5 years ago

This problem is the whole reason "mangle_from" exists in the email library...