verdammelt / tnef

tnef
GNU General Public License v2.0
58 stars 21 forks source link

Chinese name issue #28

Closed witwall closed 6 years ago

witwall commented 6 years ago

if attachments named in Chinese, would not get the right name,

here is the output files name when run tnef winmail.dat

C%C0%E0%CE%EF%C1%CF1.xls  
C%C0%E0%CE%EF%C1%CF2.xlsx 
D%C0%E0%CE%EF%C1%CF.xls

and here is the right name when run tnef -t winmail.dat

btw, I am using Mac OS X

verdammelt commented 6 years ago

@witwall thanks for the report. To investigate I'll need a datafile that illustrates this problem. Can you attach one here or email one to me?

Also your report shows how things are wrong with tnef -t winmail.dat but then you say it has "right name" with the same command? Please clarify.

witwall commented 6 years ago

sorry, my typo, wrong with extract, but right with -t(just show name).

and I will send you a sample.

jidanni commented 6 years ago

Here's one. winmail.dat.gz Apparently the filenames are in big5, which create gobbeldygook filenames when extracted on a UTF-8 system.

verdammelt commented 6 years ago

I haven't looked at the data file; but any ideas how one would know what encoding the filenames are in?

witwall commented 6 years ago

./tnef -t winmail.dat>list.txt

open list.txt with vcode, it should be big5, but still have wrong characters.

image

list.txt

witwall commented 6 years ago

sorry, for privacy reason, i have to remove it.

and here is my GBK version example,

#with correct encoding
./tnef -t winmail.dat>list.txt  

image

maybe we can get the right encoding/Codepage through this attribute,

attOEMCODEPAGE                  0x9007  OEM Codepage

for example, in this example, we can get it code page is rcpg936a, means GBK(oem code page 936)

and the big5 example, it is rcpg95(should be 950?)

jidanni commented 6 years ago

Also consider many users will be extracting on UTF-8 systems and it would be nice to convert the filenames by default. And only leave them raw if an option is given.

verdammelt commented 6 years ago

Thanks for all the info. Full disclosure: I am not going to take much, if any, time to look into this until the very end of they year when I have some time off. No guarantees that I will even release anything to fix this issue.

jidanni commented 6 years ago

OK. I don't get such files often. By the way I notice wget has --local-encoding=encoding --remote-encoding=encoding

verdammelt commented 6 years ago

Regretfully I am going to not fix this issue and close it.

I do not feel I have the time & energy to properly handle code pages in TNEF. I am open to reviewing any patches to add such features.

I will be adding a little special debugging output so that CodePage data is easier for anyone to identify and I will update the README & man page to make it clear that TNEF makes assumption about the data being in some Unicode encoding.

Sorry @witwall & @jidanni . Thanks for submitting this issue and the work you've put into it.