henck / rtf-html-php

RTF to HTML converter in PHP
http://www.independent-software.com
GNU General Public License v2.0
102 stars 73 forks source link

Is this support chinese unicode? #22

Open twjjack opened 7 years ago

twjjack commented 7 years ago

Hi all,

Thanks! It is working perfectly to get the English wordings but it is not working when the RTF contains Chinese characters which are being store in unicode.

Here is my code: $rtf = '{\rtf1\ansi\ansicpg1252\uc0\deff0{\fonttbl {\f0\fswiss\fcharset0\fprq2 Arial;} {\f1\fnil\fcharset0\fprq2 SimSun;} {\f2\froman\fcharset2\fprq2 Symbol;}} {\colortbl;\red0\green0\blue0;\red255\green255\blue255;} {\stylesheet{\s0\itap0\f0\fs24 [Normal];}{*\cs10\additive Default Paragraph Font;}} {*\generator TX_RTF32 11.0.401.501;} \deftab1134\paperw11907\paperh16443\margl567\margt567\margr567\margb567\pard\itap0\plain\f1\fs20\loch\f1\hich\f1\u20320\u22909\u21527\par }';

$result = $reader->Parse($rtf); $formatter = new RtfHtml(); $test = $formatter->Format($reader->root);

and it give me this result: ◊u22909◊par

I am expecting to get the result of \u20320\u22909\u21527\ which I can then translated it back to Chinese character.

Is there any one here have similar issue and what is the solution?

Cheers, Jack

sipryan commented 6 years ago

Support for unicode characters was recently added. Please recheck if the problem persists. Thanks

humblecoder commented 6 years ago

Attempting to parse mixed English/Cantonese documents. Cantonese is being garbled. Also receiving a great deal of output such as

    ...
    WORD rtf (1)
    WORD adeflang (1025)
    WORD ansi (1)
    WORD ansicpg (1252)
    WORD uc (1)
    WORD adeff (0)
    WORD deff (0)
    ...
sipryan commented 6 years ago

Unfortunately far eastern languages (UTF-16 & UTF-32) not yet implemented ! you can help us by uploading your RTF file thanks