joshy / striprtf

Stripping rtf to plain old text
http://striprtf.dev
BSD 3-Clause "New" or "Revised" License
90 stars 27 forks source link

problem with backslash in the rtf string #51

Closed aynuayex closed 8 months ago

aynuayex commented 8 months ago

as we all know python think backslash's as a escape literal and finding rtf with out backslash's is impossible while this can be handled by prepending with the row 'r' or 'R' before the rtf string to tell python no no this backslash's are literal's not escape sequence.

the problem that i am facing is if the rtf is fetched from the mssql database and let us say i am accessing the rtf like an object how do i prepend the rtf string with the row 'r' or 'R' Or escape the backslash's by replacing with two backslash's ?

note that replacing in the database is impractical

stevengj commented 8 months ago

Backslashes are only treated as escapes in literal strings appearing in Python code. Backslashes appearing in the string data itself (from a database, file, or wherever) are not escapes.

aynuayex commented 8 months ago

@stevengj, hi sir can you please see this detailed question here https://stackoverflow.com/questions/78159287/how-to-append-rowr-to-a-variable-which-is-rtf-string Thanks in advance.

joshy commented 8 months ago

@stevengj Thanks for answering the question! @aynuayex If it is working with copying the raw value from the db to a string then it has do to with how you open the db connection and the used encoding. Therefore closing the issue here.

aynuayex commented 8 months ago

no no no @joshy it does not work unless i specifically add r in front of the rtf string to tell it, it is raw otherwise i get the below error and if i use a variable(or assume fetching the rtf from the database and it is like array of objects) to place the rtf, well how do i add r in front of it? so this issue must not be closed yet.

aynuayex commented 8 months ago

here is the error i get

D:\pythonCodePycharmProjects\carProject\venv\Scripts\python.exe D:\pythonCodePycharmProjects\carProject\main.py 

  File "D:\pythonCodePycharmProjects\carProject\main.py", line 3

    ab There is no evidence for mass or lymphadenopathy in the abdomen or pelvis. \par \'b7\tab There is no evidence for 
ascites. \par \'b7\tab The visualized lung bases are clear. \par \'b7\tab Osseous structures are normal. \par \pard\ltrpar \par \b
 IMPRESSION\b0 : \par Localized fat stranding 2* to ? IBD \par \pard\ltrpar No evidence of urolithiasis\f1\fs20 \par }"

          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 140-143: truncated \uXXXX escape
joshy commented 8 months ago

But it has nothing to do with the library but with how you load the data from the database (array of objects?). If you are viewing the database values it could be that you are not seeing the raw string but a already converted string.

aynuayex commented 8 months ago

here is the code that give me errors from striprtf.striprtf import rtf_to_text

rtf = "{\rtf1\fbidis\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\froman\fprq2\fcharset0 Verdana;}{\f1\fnil\fcharset0 Trebuchet MS;}} \viewkind4\uc1\pard\ltrpar\b\f0\fs20 TECHNIQUE\b0 :- \par Non contrast of CT the abdomen and pelvis was performed without oral contrast. \par \b FINDINGS\b0 : \par \tab \par \pard\ltrpar\fi-360\li720\'b7\tab The liver, spleen and pancreas are normal. No focal calcification or mass lesion is seen in the pancreas. The main pancreatic duct is not dilated. The gall bladder has normal size and wall thickness. No radiopaque stone. There is no evidence for biliary duct dilation. \par \'b7\tab The kidneys are normal in appearance bilaterally with no evidence for radiodense stone or hydronephrosis.There is no radiodense stone in the ureter or bladder. The adrenal glands appear normal. \par .\tab There is a fat stranding around the ascending colon with thickened peritoneum. There is no abnormal bowel wall thickening and appendix has normal caliber. \par \'b7\tab There is no evidence for mass or lymphadenopathy in the abdomen or pelvis. \par \'b7\tab There is no evidence for ascites. \par \'b7\tab The visualized lung bases are clear. \par \'b7\tab Osseous structures are normal. \par \pard\ltrpar \par \b IMPRESSION\b0 : \par Localized fat stranding 2* to ? IBD \par \pard\ltrpar No evidence of urolithiasis\f1\fs20 \par }"

text = rtf_to_text(rtf) print(text)

aynuayex commented 8 months ago

so, @joshy is the format of the rtf incorrect or what? why the above error happens? any ideas thanks in advance.