Closed GoogleCodeExporter closed 9 years ago
I just needed a quick solution to dump the database and reload it in another
environment. So I made some changes to multipart.py to get pass this utf-8
thing. It did work.
However, I understand that other parts are using multipart.py too. This
probably won't fit the MIME standard. If I have time, I'll investigate further
and provide a patch that does satisfy the MIME standard.
Original comment by heshim...@gmail.com
on 14 May 2011 at 8:20
Attachments:
Confirm. There is also invalid test case about how multipart module works with
unicode data: StringIO could handle mixed "str" and "unicode" values, but files
requires only "str" one.
Original comment by kxepal
on 14 May 2011 at 8:20
Sorry, I was wrong about tests - StringIO confused me(: Don't rush, sit down
and think about...yes(:
There is no needs to fix multipart module, only dump tool due to it pass
unicode document id to multipart writer. This is about dump-tool.patch.
dump-tool-2.patch solves same problem, but with respect of Content-Type header
and his charset. I suppose, that would a more correct solution.
Original comment by kxepal
on 14 May 2011 at 9:20
Attachments:
Ah, that's much smarter. Thanks!
Original comment by heshim...@gmail.com
on 14 May 2011 at 10:43
Hmm... another thing. I was under the impression that utf-8 encoded strings
aren't valid ascii. Currently, isn't multipart.py expecting strict ascii
strings as header?
Original comment by heshim...@gmail.com
on 14 May 2011 at 10:48
Actually, only first 128 chars of utf-8 encoding are valid ascii. Problem was
not in what characters in headers, but in type of string multipart tries to
write into output stream. Files and streams doesn't expects pure unicode
strings, but favors stings called as "bytes" in Python 3 terminology and
multipart module expects this behavior.
But there was a "hack" which adds to headers document id which used by
couchdb-load tool to help create document with same id value. Since document id
could be unicode, this "hack" breaks expectations and makes multipart crash.
You could try revert patch and replace in dump.py default value of output
argument in dump_db function from sys.stdout to StringIO.StringIO and error
wouldn't be occurred because StringIO could handle both str and unicode values.
Original comment by kxepal
on 14 May 2011 at 11:14
IMO the correct way to have non-ASCII strings in MIME headers would be to use
RFC 2047 encoding for any non-ascii header values.
Original comment by djc.ochtman
on 14 May 2011 at 12:24
Correct, but looks like an overhead in such case, because it would applied only
to one header while others should follow RFC 822. Wouldn't be better to use
base64 encoding?
Original comment by kxepal
on 14 May 2011 at 12:50
Hmm... I'd like to make a note here that kxepal's dump-tool-2.patch actually
generated some invalid multipart boundaries.
Original comment by heshim...@gmail.com
on 2 Jun 2011 at 6:47
Original comment by djc.ochtman
on 21 Sep 2012 at 8:32
Original comment by wickedg...@gmail.com
on 22 Sep 2012 at 12:44
Any progress on this?
Original comment by djc.ochtman
on 22 Oct 2012 at 11:26
Yes, will submit patch with tests during this week. I'd agreed with you about
RFC 2047 specification, so diving into it.
Original comment by kxepal
on 22 Oct 2012 at 11:33
Patch attached. Non-ascii headers now encoded following RFC 2047. Actually, I
feel to rewrite multipart module to let him base on top of email package, but
probably that would be another issue - need to workaround some email specific
features to keep backward compatibility.
Original comment by kxepal
on 24 Apr 2013 at 5:20
Attachments:
Sorry, forgot to cleanup testing prints. Reattached.
Original comment by kxepal
on 24 Apr 2013 at 5:25
Attachments:
Pushed a slightly changed patch as rce40fd77ae8d, thanks!
Original comment by djc.ochtman
on 25 Apr 2013 at 10:09
Original comment by djc.ochtman
on 25 Apr 2013 at 11:16
Original issue reported on code.google.com by
heshim...@gmail.com
on 14 May 2011 at 7:58