joshy / striprtf

Stripping rtf to plain old text
http://striprtf.dev
BSD 3-Clause "New" or "Revised" License
94 stars 27 forks source link

Add support for ansicpg encoding (e.g. windows-1250) #19

Closed jan-swiecki closed 3 years ago

jan-swiecki commented 3 years ago

Hi!

Great work on this library @joshy!

I've had some problems decoding files encoded with Windows-1250 (CP1250), which are probably created by some old word versions or something like that. So I've added support for such files. See diffs. You can see that I've added new test for Polish language, encoded in Windows-1250. All tests pass on my computer.

From http://www.biblioscape.com/rtf15_spec.htm#Heading8:

\ansicpgN | This keyword represents the ANSI code page which is used to perform the Unicode to ANSI conversion when writing RTF text. N represents the code page in decimal. This is typically set to the default ANSI code page of the run-time environment (for example \ansicpg1252 for U.S. Windows). The reader can use the same ANSI code page to convert ANSI text back to Unicode.This keyword should be emitted in the RTF header section right after the \ansi, \mac, \pc or \pca keyword.

Tests:

> pytest tests
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: [redacted]/striprtf
collected 13 items                                                                                                                                                                                        

tests/test_ansicpg1250.py .                                                                                                                                                                         [  7%]
tests/test_calcium_score.py .                                                                                                                                                                       [ 15%]
tests/test_french.py .                                                                                                                                                                              [ 23%]
tests/test_hello.py .                                                                                                                                                                               [ 30%]
tests/test_line_breaks_google_docs.py .                                                                                                                                                             [ 38%]
tests/test_line_breaks_textedit_mac.py .                                                                                                                                                            [ 46%]
tests/test_nested_table.py .                                                                                                                                                                        [ 53%]
tests/test_nutridoc.py .                                                                                                                                                                            [ 61%]
tests/test_sample_3.py .                                                                                                                                                                            [ 69%]
tests/test_simple.py ..                                                                                                                                                                             [ 84%]
tests/test_speiseplan.py .                                                                                                                                                                          [ 92%]
tests/test_unicode.py .                                                                                                                                                                             [100%]

=========================================================================================== 13 passed in 0.10s ============================================================================================
joshy commented 3 years ago

Hi Jan, thanks a lot for your contribution. 👍🏾