hassanakbar4 / mailarchive-tickets

0 stars 0 forks source link

Corrupted messages #139

Open hassanakbar4 opened 9 years ago

hassanakbar4 commented 9 years ago

component_MailArchive: ArchiveContents type_cleanup | by rcross@amsl.com


During initial archive migration, and IMAP testing, instances of mbox file corruption were noted. Areas of the mbox file were corrupted in such a way it looked as if two duplicate writes to the file had overlapped each other. In it's simplest form:

{ start of message A

{ full message A }

end of message A }

Often the corruption is more complex, involving more copies of the message or more messages. The files still managed to parse without error so the archive has corrupted messages. Common symptoms are no subject or headers in the message body. I manually fixed known instances of this problem during initial migration and again during IMAP testing 07-2015. However it is clear there are still many more instances of this issue. See this query for example:

https://mailarchive.ietf.org/arch/search/?qdr=c&start_date=1996-07-01&end_date=1996-07-10&email_list=ietf&q=&as=1#

See messages with empty subject . There are 2588 messages in the archive with no subject. Certainly some of these are valid, or are spam, but many, especially from the 90's are likely instances of this corruption.


Issue migrated from trac:1785 at 2021-09-22 16:46:48 +0500

hassanakbar4 commented 9 years ago

@hassanakbar4 edited the issue description

hassanakbar4 commented 5 years ago

@hassanakbar4 changed type from defect to cleanup

hassanakbar4 commented 4 years ago

@hassanakbar4 changed status from new to waiting