kota7 / striprtf

R Package for Extracting Text from RTF (Rich Text Format) File
Other
19 stars 4 forks source link

Robustness against mismatched curly braces { } #18

Closed kota7 closed 3 years ago

kota7 commented 3 years ago

read_rtf crashes when the input RTF file contains mismatched curly braces. Make it robust against this type of input.

Example:

{\rtf1\ansi\deff3\adeflang1025
{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}{\f1\froman\fprq2\fcharset2 Symbol;}{\f2\fswiss\fprq2\fcharset0 Arial;}{\f3\froman\fprq2\fcharset0 Liberation Serif{\*\falt Times New Roman};}{\f4\fswiss\fprq2\fcharset0 Liberation Sans{\*\falt Arial};}{\f5\fnil\fprq2\fcharset0 Noto Sans CJK SC;}{\f6\fnil\fprq0\fcharset128 Lohit Devanagari;}{\f7\fnil\fprq2\fcharset0 Lohit Devanagari;}}
{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}
{\stylesheet{\s0\snext0\hich\af3\dbch\af8\langfe2052\dbch\af7\afs24\alang1081\widctlpar\hyphpar0\ltrpar\cf0\loch\f3\fs24\lang1033\kerning1 Normal;}
{\s15\sbasedon0\snext16\dbch\af5\dbch\af7\afs28\sb240\sa120\keepn\loch\f4\fs28 Heading;}
{\s16\sbasedon0\snext16\sl276\slmult1\sb0\sa140 Text Body;}
{\s17\sbasedon16\snext17\dbch\af6\sl276\slmult1\sb0\sa140 List;}
{\s18\sbasedon0\snext18\dbch\af6\afs24\ai\sb120\sa120\noline\fs24\i Caption;}
{\s19\sbasedon0\snext19\dbch\af6\noline Index;}
}{\*\generator LibreOffice/6.4.7.2$Linux_X86_64 LibreOffice_project/40$Build-2}{\info{\creatim\yr2021\mo9\dy7\hr13\min19}{\revtim\yr2021\mo9\dy7\hr14\min8}{\printim\yr0\mo0\dy0\hr0\min0}}{\*\userprops}\deftab709
\hyphauto1\viewscale120
{\*\pgdsctbl
{\pgdsc0\pgdscuse451\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\pgdscnxt0 Default Style;}}
\formshade\paperh16838\paperw11906\margl1134\margr1134\margt1134\margb1134\sectd\sbknone\sectunlocked1\pgndec\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\ftnbj\ftnstart1\ftnrstcont\ftnnar\aenddoc\aftnrstcont\aftnstart1\aftnnrlc
{\*\ftnsep\chftnsep}\pgndec\pard\plain \s0\hich\af3\dbch\af8\langfe2052\dbch\af7\afs24\alang1081\widctlpar\hyphpar0\ltrpar\cf0\loch\f3\fs24\lang1033\kerning1\ql\ltrpar{\loch
This file contains mismatched curly braces.}
\par }
}
kota7 commented 3 years ago

Solved in 0.5.3.