kota7 / striprtf

R Package for Extracting Text from RTF (Rich Text Format) File
Other
19 stars 4 forks source link

Handling fcharset correctly #24

Open kota7 opened 1 year ago

kota7 commented 1 year ago

Integer codes are actually handled by \fcharsetN parameter defined in the \fonttbl section. A section in the document received a font spec given as \fN, which is defined in \fonttbl. A font spec has a \fcharsetN or \cpgN definition, which tells which code mapping to apply.

For example, fcharset0 is for ANSI, fcharset128 is for Shift Jis, fcharset134 is for GB2312.

This is not correctly handled in the current version (v0.6.0), where we (perhaps wrongly) assumed that a file only uses the code mapping given in the \ansicpgN code page.

Reference: