tbluemel / rtf.js

Render RTF documents in HTML.
MIT License
146 stars 36 forks source link

Consider using fewer codepages? #41

Open blerner opened 6 years ago

blerner commented 6 years ago

According to the RTF spec (https://www.microsoft.com/en-us/download/details.aspx?id=10725), there are only a few codepages needed in RTF:

Code                page | Name
-- | --
437 | United States IBM
708 | Arabic (ASMO 708)
709 | Arabic (ASMO 449+, BCON V4)
710 | Arabic (transparent Arabic)
711 | Arabic (Nafitha Enhanced)
720 | Arabic (transparent ASMO)
819 | Windows 3.1 (United States and Western Europe)
850 | IBM multilingual
852 | Eastern European
860 | Portuguese
862 | Hebrew
863 | French Canadian
864 | Arabic
865 | Norwegian
866 | Soviet Union
874 | Thai
932 | Japanese
936 | Simplified Chinese
949 | Korean
950 | Traditional Chinese
1250 | Eastern European
1251 | Cyrillic
1252 | Western European
1253 | Greek
1254 | Turkish
1255 | Hebrew
1256 | Arabic
1257 | Baltic
1258 | Vietnamese
1361 | Johab
10000 | MAC Roman
10001 | MAC Japan
10004 | MAC Arabic
10005 | MAC Hebrew
10006 | MAC Greek
10007 | MAC Cyrillic
10029 | MAC Latin2
10081 | MAC Turkish
57002 | Devanagari
57003 | Bengali
57004 | Tamil
57005 | Telugu
57006 | Assamese
57007 | Oriya
57008 | Kannada
57009 | Malayalam
57010 | Gujarati
57011 | Punjabi

As far as I can tell, rtf.js supports 145 code pages (searching for cptable[###] = in the RTFJS.bundle.js file), and eliminating ones that aren't necessary could cut down the bundle file size substantially.

zoehneto commented 6 years ago

From the spec: Possible values include those in the following table. A quick google search shows that there are rtf documents which use other codepages (for example google ansicpg10002). For maximum document compatibility I want to keep the default as is. What I could do is load the codepages as an external module / additional script, that way you could supply your own cut down cptable for scenarios where you know which codepages will be used.

lounsbrough commented 1 year ago

@zoehneto - I am also interested in using this library but it currently would double the size of our deployment bundle. How hard would it be to do what you described above so that I could provide only the code page that I need?

zoehneto commented 1 year ago

In theory you'd only have to add the library to the webpack externals, remove the include from the dev / prod config and add a peer dependency to the package.json (on the rtf.js side, you'd still need to adapt your config to provide codepagejs appropriately). I currently don't have time to look further into it, but I'd be happy about a PR, if you want to implement the feature.

lounsbrough commented 1 year ago

I will take a look and see if its something I can do quickly or if I run into any roadblocks. 👍🏼

lounsbrough commented 1 year ago

I attempted to do this but ran into issues, mainly because this package was designed to be available in the browser, and the codepage tables are baked in. What I played with is on this branch: https://github.com/lounsbrough/rtf.js/tree/extract-codepage. I think someone more familiar with the app would need to address, or decide what the best path is.

mattiaskagstrom commented 1 year ago

Hi! The codepages currently takes up the majority of our bundle-size: image