joshy / striprtf

Stripping rtf to plain old text
http://striprtf.dev
BSD 3-Clause "New" or "Revised" License
94 stars 27 forks source link

cyrillic turned into chinese #29

Closed bobert13 closed 2 years ago

bobert13 commented 2 years ago

Hi

I have 2 files in cyrillic. I can read both without issue in MS Word. The first seems to work fine with:

with open(fullpath) as infile:
                content = infile.read()
                text = rtf_to_text(content ,'ignore')

The second (bad.zip) gets turned into chinese characters

good.zip bad.zip

sample output from the good one:

>>> tabtext =text.split("|||")
>>> print(tabtext[0])
Таблиця розподілу номерного ресурсу
Кіровоградська область|
Код зони - 52

sample output from the bad one:

>>> tabtext =text.split("|")
>>> print(tabtext[0])
亦犭桷 痤顼钿畴 眍戾痦钽 疱耋瘃
它獬怦赅 钺豚耱鼃
暑 珙龛 - 32

if i leave out the "ignore", i get: UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 6: illegal multibyte sequence

any idea how i can work around this?

bobert13 commented 2 years ago

Hi,

I apologize in advance for my ignorance here. I'm pretty new to python. Based on this email, I'm assuming you put in a commit to fix whatever caused this issue. Can I upgrade my current version of striprtf using pip in order to get the fix to work?

Thanks

On Wed, Jan 5, 2022 at 6:28 PM Joshy Cyriac @.***> wrote:

Closed #29 https://github.com/joshy/striprtf/issues/29 via b2e88aa https://github.com/joshy/striprtf/commit/b2e88aaff11aabb9d67f02cd72e98a0213fcd1b7 .

— Reply to this email directly, view it on GitHub https://github.com/joshy/striprtf/issues/29#event-5847526726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ3QFDMNDZLRAF4WTVNU33UURWTVANCNFSM5LJXP6PA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

joshy commented 2 years ago

Hi,

yes the issue is fixed but until now there was no new version. Now you can upgrade you striprtf version (0.0.19) and it should work.

BR Joshy

bobert13 commented 2 years ago

Awesome, thanks!

On Thu, Jan 6, 2022 at 9:26 AM Joshy Cyriac @.***> wrote:

Hi,

yes the issue is fixed but until now there was no new version. Now you can upgrade you striprtf version (0.0.19) and it should work.

BR Joshy

— Reply to this email directly, view it on GitHub https://github.com/joshy/striprtf/issues/29#issuecomment-1006341761, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ3QFHKV6VYBXPH4E3Q7STUUU737ANCNFSM5LJXP6PA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>