Open GoogleCodeExporter opened 9 years ago
I did some more digging and the mbox I'm importing is more complicated than I
thought. Also, looking at the
source code, I see the error presented is coming back from the server. I'll
look at the exact headers sent by
gmum to see how they are invalid (I suspect gmum is having to construct a
"Date:" with insufficient or
confusing information).
FWIW: Here's a truncated version of a failing message. Note the original
headers are moved to a 'quotation' in
the message body .. I believe this was done by Apple's AOCE/PowerTalk mailer.
It may turn out that I have to
pre-process these mbox's before feeding them to gmum ... more later.
From PowerTalk 01 Jan 97 00:00:00 GMT
Date: Wed, 20 Mar 1996, 14:3:58
Subject: ACAP protocol outline
From: jgm+@CMU.EDU (John Gardiner Myers)
To: ietf-acap+@andrew.cmu.edu (IETF ACAP Mailing List)
ABNF extension:
a ::+ b
a ::+ c
a ::+ d
is equivalent to
a ::= (b) / (c) / (d)
Protocol overview
atom, quoted-string, literal
no-wait literal -- client doesn't need to wait for ready response
"{" number "+}" CRLF *OCTET
; NUL only allowed in value of attributes whose names end in
; ".bin"
Initial greeting
initial_greeting ::= "*" SP "OK" *(SP capability) CRLF
capability ::= atom [ "=" astring ]
------------------ RFC822 Header Follows ------------------
Received: from judgmentday.rs.itd.umich.edu by dagora.rs.itd.umich.edu (8.7.4/2.2)
id OAA02513; Wed, 20 Mar 1996 14:14:17 -0500 (EST)
Received: by judgmentday.rs.itd.umich.edu (8.7.4/2.2)
with X.500 id OAA27307; Wed, 20 Mar 1996 14:14:14 -0500 (EST)
Received: from po10.andrew.cmu.edu by judgmentday.rs.itd.umich.edu (8.7.4/2.2)
with ESMTP id OAA27302; Wed, 20 Mar 1996 14:14:13 -0500 (EST)
Received: (from postman@localhost) by po10.andrew.cmu.edu (8.7.5/8.7.1) id OAA00723; Wed, 20 Mar 1996
14:04:19 -0500
Received: via switchmail; Wed, 20 Mar 1996 14:04:18 -0500 (EST)
Received: from hogtown.andrew.cmu.edu via qmail
ID </afs/andrew.cmu.edu/service/mailqs/testq0/QF.klI5OYq00WBwB:xE53>;
Wed, 20 Mar 1996 14:04:05 -0500 (EST)
Received: from hogtown.andrew.cmu.edu via qmail
ID </afs/andrew.cmu.edu/usr7/jgm/.Outgoing/QF.clI5OSG00WBwA12PtP>;
Wed, 20 Mar 1996 14:03:58 -0500 (EST)
Received: from BatMail.robin.v2.14.CUILIB.3.45.SNAP.NOT.LINKED.hogtown.andrew.cmu.edu.sun4c.411
via MS.5.6.hogtown.andrew.cmu.edu.sun4c_411;
Wed, 20 Mar 1996 14:03:58 -0500 (EST)
Message-ID: <klI5OSC00WBwI12Pl0@andrew.cmu.edu>
Date: Wed, 20 Mar 1996 14:03:58 -0500 (EST)
X-UIDL: 827350024.003
From: John Gardiner Myers <jgm+@CMU.EDU>
To: IETF ACAP Mailing List <ietf-acap+@andrew.cmu.edu>
Subject: ACAP protocol outline
Status: U
From PowerTalk 01 Jan 97 00:00:00 GMT
Date: Wed, 20 Mar 1996, 13:59:41
Original comment by gavineadie
on 30 Jan 2010 at 4:11
Here's a patch to EmUpUtilities.m to fix my problem:
199a200,229
> //
------------------------------------------------------------------------------
> // Apple's old AOCE/PowerTalk mailer used to move the 'extra' headers to
the end
> // of the body of the message. If this is the case, we need to find them
and add
> // them into the header collection (overwriting any duplicates, because
this IS
> // the original header set).
>
> NSString * body = [message substringFromIndex:[headers length]];
> NSScanner * bodyScanner = [NSScanner scannerWithString:body];
> [bodyScanner setCharactersToBeSkipped:[NSCharacterSet
whitespaceCharacterSet]];
> NSString * rfc822Divider = [NSString stringWithFormat:@"%@%@%@",
> endOfLine, endOfLine, @"------------------ RFC822 Header
Follows ------------------"];
>
> if ([bodyScanner scanUpToString:rfc822Divider intoString:nil]) {
> [bodyScanner scanUpToString:@"" intoString:&headers];
>
> // advance past the divider line and then outdent <moreHeader> one
space
> headers = [headers substringFromIndex:[rfc822Divider length]];
> if ([headers length] > 0) {
> NSMutableString * mutable = [NSMutableString
stringWithString:headers];
> [mutable replaceOccurrencesOfString:@"\n "
> withString:@"\n"
> options:0
> range:NSMakeRange(0, [mutable
length])];
>
> message = [[mutable substringFromIndex:1]
stringByAppendingString:body];
> }
> }
>
> //
------------------------------------------------------------------------------
>
219a250,251
>
>
Original comment by gavineadie
on 30 Jan 2010 at 8:48
Actually, I think I missed an encoding thing. The above works with a section
of the original file copied/pasted
into a test file, but not with the original. Thinking cap is back on!
Original comment by ramsayc...@gmail.com
on 31 Jan 2010 at 2:43
Found it. Misunderstanding of how the scanner works. Here's the replacement:
199a200,231
> //
------------------------------------------------------------------------------
> // Apple's old AOCE/PowerTalk mailer used to move the 'extra' headers to
the end
> // of the body of the message. If this is the case, we need to find them
and add
> // them into the header collection (overwriting any duplicates, because
this IS
> // the original header set).
>
> NSString * body = [message substringFromIndex:[headers length]];
> NSScanner * bodyScanner = [NSScanner scannerWithString:body];
> [bodyScanner setCharactersToBeSkipped:[NSCharacterSet
whitespaceCharacterSet]];
> NSString * rfc822Divider = [NSString stringWithFormat:@"%@%@%@",
> endOfLine, endOfLine, @"------------------ RFC822 Header
Follows ------------------"];
>
> if ([body rangeOfString:rfc822Divider].length > 0) {
> [bodyScanner scanUpToString:rfc822Divider intoString:nil];
> [bodyScanner scanUpToString:@"" intoString:&headers];
>
> // advance past the divider line and then outdent <moreHeader> one
space
> headers = [headers substringFromIndex:[rfc822Divider length]];
> if ([headers length] > 0) {
> NSMutableString * mutable = [NSMutableString
stringWithString:headers];
> [mutable replaceOccurrencesOfString:[endOfLine
stringByAppendingString:@" "]
> withString:endOfLine
> options:0
> range:NSMakeRange(0, [mutable
length])];
>
> message = [[mutable substringFromIndex:1]
stringByAppendingString:body];
> }
> }
>
> NSLog(@"MESSAGE:\n%@", message);
> //
------------------------------------------------------------------------------
>
Original comment by ramsayc...@gmail.com
on 31 Jan 2010 at 3:55
[deleted comment]
I'm seeing the same thing with old email that I've imported from Pine to Eudora
to Mail.app. I am importing from
Mail.app's .emlx files though.
Original comment by dfbi...@gmail.com
on 3 Feb 2010 at 1:53
Attachments:
The more I've looked at importing email to Google Apps the more it's apparent
that the Google RFC822 importer
is overly strict in its acceptance of malformed headers. In this game the
maxim is, "be generous accepting
mistakes on import, and don't make any in what you export."
For example the "Date: " header above:
Date: Wed, 20 Mar 1996 14:03:58 -0500 (EST)
is illegal because of the "(EST)." You either use "-0500" OR "EST" but not both and that's why Google is
rejecting that message. Another example:
Date: Wed, 20 Mar 1996, 14:03:58 -0500
that second comma is illegal and causes Google to reject messages that have that error.
This isn't something that the email-uploader can influence unless it's going to
parse all the headers and correct
the erroneous ones, which is not something for the faint-hearted -- I've done
that once in my career, and once
was enough! Just about every email program under the sun makes some kind of
error in its generated headers
... ugh
Original comment by gavineadie
on 3 Feb 2010 at 3:15
So does this mean we've reached a dead-end?
Original comment by dfbi...@gmail.com
on 3 Feb 2010 at 2:04
I went in and hand-edited all the offending headers till the file was imported
successfully -- BBEdit and a few
canned scripts made it fairly easy. FWIW, I've imported 106,000 messages this
week!
Original comment by gavineadie
on 3 Feb 2010 at 2:54
I've filed a bug internally on the Email Migration API (2417545) with your
comments
about the parser strictness. You can also bring it up on the Email Migration
API
discussion group at http://tinyurl.com/yatasrx
Did you find the "RFC822 Header Follows" to be an unqualified win? I'd worry
about
message replies having extra copies of those headers, so they'd end up
overwriting
other headers inappropriately. While I find some examples of the "RFC822 Header
Follows" in my old mail archives, too, they do not appear to be indented a
space.
Original comment by gregrobbins
on 3 Feb 2010 at 10:52
Thanks for taking my comments over there, Greg.
I worried about having extra copies of the headers too and, in fact, the code
snippet above doesn't do what
the comment says (oops), it actually doesn't use the 'top headers' at all, it
ignores them and only uses the 'tail
headers.' And, yes, I also had many messages which did not have the indented
headers. In some cases the
'top headers' were totally useless, so losing them was a benefit.
I've been importing batches of messages dated 1994-2000 during which time I
used multiple versions of
multiple mail clients and my messages had passed through a few different
gateways .. the variations on the
'standard' are frightening!
I wrote my first SMTP gateway in 1982 (in a language called Plus, a derivative
of Sue) and had to incorporate
RFC822 parsing and generation, in addition to the expected RFC821 work, because
the mainframe host
system (MTS) maintained messages in a database and didn't utilize RFC822
formats. I've been reliving those
days this last week and I'm of mixed emotions -- ~30 year reminiscences are
rosy till you remember the
details!!
Original comment by gavineadie
on 3 Feb 2010 at 11:43
The latest glitch is finding both \r AND \n used in converted AOCE messages.
The \n is in the headers and so is
assigned to <endOfLine>; the \r is the line separator in the body and so
defeats the check for "RFC822 Header
Follows" .. temporary solution is to look at the character after that string
and use it in the unindenting
<replaceOccurrencesOfString> instead of <endOfLine>.
Permanent solution is to talk to author of Emailchemy (the converter I use) and
ask him to be sure all the end-
of-line strings are the same throughout the file.
Original comment by gavineadie
on 4 Feb 2010 at 3:11
This is a big issue for me as well - Mac OS X 10.6.2 and Apple Mail 4.2 (1077).
Can't upload email for 40 users
that I want to move to Google Apps .. trying to do this through IMAP instead,
but sloooooooow :-)
If there's any info I can provide to help resolve this, I'm more than happy to
help.
Original comment by nathan.m...@gmail.com
on 31 Mar 2010 at 4:29
I am also having issues with this. I am not sure how to handle this since I
have a user that has about 10,000
emails we are trying to upload. Is there something I can do to get this to
work or a work around? I would
appreciate any suggestions since I do not know how to go forward.
Original comment by rwburke%...@gtempaccount.com
on 19 Apr 2010 at 3:55
The server team says there will be some improvements in date header parsing
rolling out, hopefully this week.
Original comment by gregrobbins
on 19 Apr 2010 at 7:44
Any news? I continue to migrate thousands of emails via IMAP drag 'n' drop
because the uploader is unhappy
with extra quotes in the date header. Let me know if I can provide any help
Original comment by nathan.m...@gmail.com
on 1 Jun 2010 at 4:20
The server team has made some changes already. If you still see uploading
errors due to headers that look like
they should be legal (or close enough), please paste example(s) here.
Original comment by gregrobbins
on 1 Jun 2010 at 11:18
The operation couldn’t be completed. (Date header "Thu Jan 05 16:15:01 2006" is invalid.)
Original comment by b.n.will...@gmail.com
on 2 Oct 2013 at 4:43
Original issue reported on code.google.com by
gavineadie
on 30 Jan 2010 at 2:30