hypermail-project / hypermail

Hypermail is a free (GPL) program to convert email from Unix mbox format to html.
http://www.hypermail-project.org/
GNU General Public License v2.0
156 stars 73 forks source link

ISO-2022-JP support is incomplete and incorrect #36

Closed bshannon closed 4 years ago

bshannon commented 7 years ago

Setting the iso2022jp configuration variable is supposed to allow hypermail to handle messages that might use the ISO-2022-JP character set. Unfortunately, this support is incomplete and incorrect.

(Even more unfortunately, the i18n_body configuration variable isn't implemented at all. If it were implemented, messages would be transformed to utf-8, and all these problems would be avoided.)

hypermail attempts to transform anything that looks like an email address by replacing the at sign (and optionally the domain name) with a string (default "at") to confuse screen-scraping programs. But if there's an "@" character in the middle of a string of iso-2022-jp encoded Japanese characters, hypermail still converts it, breaking the Japanese character.

When converting a utf-8 string to iso-2022-jp, hypermail always adds the "return to ASCII" escape sequence, even if the string ends with ASCII.

jkbzh commented 6 years ago

@bshannon Could you send me a file (or files) that showcase both errorr? I need them for test cases and see if the current version I'm working on already fixed it (it's on applemail_hack). We have not been able to reproduce these errors even with the 2.3.0 one. Thanks!

bshannon commented 6 years ago

Here's an example (I think), stripped down to the essentials:


Content-Type: text/plain; charset=ISO-2022-JP

$B1qED$G$9!#(B

$B@h$K<ALd$7$?7o$O!"(BJRE$B$r(BJDK$B$KJQ99$9$k$3$H$GL5;v2r7h$7$?$N$G$9$,!"(B $B$$$^K\Bj$N(BMS Access$B$N(B*.mdb$B%U%!%$%k$r%;%C%7%g%s%S!<%s7PM3$G!"(Bweb$B$K(B $BI=<($9$k%F%9%H$r$7$F$$$^$9!#(B

$B$G!"$I$&$b$&$^$/$$$+$:!"(BSJSAS$B$N%m%0$r8+$F$_$k$H!"(BSEVERE$B$,0l7o(B $BH/@8$7$F$$$^$9!#(B

$B$d$m$&$H$7$F$$$k$3$H$r4JC1$K=q$/$H!"%5!<%P!&%/%i%$%"%s%H$N9=@.$G!"(B $B%5!<%P$K(B*.mdb$B%U%!%$%k$,$"$j!"$3$N(Bmdb$B$N$J$+$N0l$D$N%F!<%V%k$N(B $BFbMF$r%/%i%$%"%s%HB&$N(Bweb$B%V%i%&%6$K0lMwI=<($9$k!"$H$$$&$b$N$G$9!#(B $B$H$j$"$($:(Bweb$B%V%i%&%6$d!"%/%i%$%"%s%H%=%U%H$+$i$3$N(Bmdb$B%U%!%$%k$K(B $B%G!<%?$rDI2C$7$?$j!"99?7$7$?$j$9$kM=Dj$OL5$/!"C1$K8=:_$N%G!<%?$,(B $BI=<($G$-$l$P$=$l$G(Bok$B$G$9!#(B $B!J99?7=hM}$J$I$r$d$m$&$H$9$k$H!"(Bjdbc.odbc$B7PM3$@$H(BJTA$B$K$A$c$s$HBP1~$7$F$J$$(B $B$+$i$+$$$m$$$m$HLdBj$,$"$k;]$N>pJs$O%M%C%H$G8+$D$1$^$7$?!#:#2s$O1\Mw$@$1(B $B$J$N$G!"LdBj$O$J$$$N$G$O$J$$$+$H9M$($F$$$^$9$,!K(B

SJSAS$B$N4N?4$N(BSEVERE$B%m%0$O%a!<%k$NKvHx$K$D$1$k$H$7$F!"8=:_$N(B $B@_Dj$r2DG=$J8B$j0J2<$K5-$7$?$$$H;W$$$^$9!#(B

$B"(0lIt!"%/%i%9L>$d%Z!<%8L>$J$I$r<B:]$N$b$N$H<c430c$&$b$N$K=q$-D>$7$F(B $B!!$$$^$9$N$G!"=q$-D>$7$NH4$1$d%_%9$,$"$C$F0l4S@-$NL5$$ItJ,$,$"$k$+$b(B $B!!$7$l$^$;$s!#(B

$B!J#0!K4D6-(B $B!!%5!<%P!'(B $B!!!!!!(BWindows Server 2005 $B!!!!!!(BJDK1.6_06 $B!!!!!!(BSJSAS9.1_01 $B!!%/%i%$%"%s%H!'(B $B!!!!!!(BWindows XP sp2 $B!!!!!!(BJDK1.606 $B!!!!!!(BNB6.1$B1Q8lHG!JF|K\8l%Q%C%A$"$F:Q$!K(B

$B!J#1!K%5!<%P$N%G!<%?%=!<%9@_Dj(B Win$B$N(B $B!!!!!V%3%s%H%m!<%k%Q%M%k!?4IM}%D!<%k!?%G!<%?%=!<%9(B(ODBC)$B!W(B $B$K$F!"(B $B!!!!L>>N!'(BGDMDB $B!!!!%I%i%$%P!'(BDriver do Microsoft Access (*.mdb) $B$GEPO?$7$F$$$^$9!#(B

$B!J#2!K(BSJSAS$B$N4IM}%3%s%=!<%k@Dj(B $B!!(BSJSAS$B$N#j4IM}%3%s%=!<%k$N(B $B!!!!!V%j%=!<%9!?(BJDBC$B!?@\B3%W!<%k!W(B $B$K$F(B $B!!!!L>A0!'(BGDMDBPool $B!!!!%G!<%?%=!<%9%/%i%9!'(Bsun.jdbc.odbc.JdbcOdbcDriver $B!!!!%j%=!<%9%/%i%9!'(Bjavax.sql.DataSource $B!!!!(B[$BDI2C%W%m%Q%F%#(B]$B$K$F(B $B!!!!(BUrl: jdbc:odbc:GDMDB $B!!!!(BUser: $B!!!!(BPassword: $B!!!!$$^$1$K0J2<$N%W%m%Q%F%#$b2C$($F$$$^$9!#(B $B!!!!(Btoplink.jdbc.url$B!!(B: jdbc:odbc:GDMDB $B!!!!(Btoplink.jdbc.user : $B!!!!(Btoplink.jdbc.password$B!!(B: $B!!!!(Btoplink.platform.class.name : oracle.toplink.essentials.platform.database.AccessPlatform $B!!!!(Btoplink.ddl-generation$B!!(B: none $B!J4{B8(B.mdb$B$N1\Mw$N$$J$N$G!K(B $B$r@_Dj$7$F$$$^$9!#(B

$B!J#3!K(BSJSAS$B$N4IM}%3%s%=!<%k@_Dj(B $B!!(BSJSAS$B$N#j4IM}%3%s%=!<%k$N(B $B!!!!!V%j%=!<%9!?(BJDBC$B!?(BJDBC$B%j%=!<%9!W(B $B$K$F!"(B $B!!!!(BJNDI$BL>!'(Bjdbc/GDMDBDS $B!JE,Ev$K$D$1$F$$$^$9!K(B $B!!!!%W!<%kL>!'(BGDMDBPool $B!J>e5-!J#2!K$GDI2C$7$?%W!<%k$r;XDj$7$F$$$^$9!K(B

$B$H$j$"$($:(BSJSAS$BB&$N@_Dj$H$7$F$O$3$l$@$1$G$9!#(B

$B$D$.$K(BNB$BB&$N%=!<%9$NJ}$G$9$,!"0J2<$N$H$*$j$G$9!#(B

EJB$B%b%8%e!<%k%W%m%8%'%/%H(B:GDMTest1-ejb$B$r:n@.$7(B

$B!J#4!K%(%s%F%#%F%#%/%i%9$r#2$D:n@.(B

$B!!!!(BGDMData.java (@Entity$B%"%N%F!<%7%g%s$N$"$k$b$N!K(B $B!!!!(BGDMDataKey.java ($BJ#9g%-!<MQ$K(B@Embeddable$B$7$F$$$k$b$N!K(B

There's lots of "@" characters that appear after the shift into iso-2022-jp (and some after the shift back to ASCII).

If that doesn't reproduce the problem, let me know and I'll look further.