spgarbet / tangram

Table Grammar package for R
66 stars 3 forks source link

UTF+8 issue with rtf function, "N=" duplicated #73

Closed jr1234567 closed 1 year ago

jr1234567 commented 1 year ago

Hello Many thanks for this great package I noticed that the rtf() function seems to add an extra "N=" string in table header of generated rtf file, as exemplified in the following example:

a <-   tangram('drug~bili["%4.03f"]+albumin+stage::Categorical[1]+protime+sex["%4.06f"]+age+spiders[1]', 
         data=pbc,
         pformat = 5,
         style="lancet",
         caption = "Table Lancet Style",
         relsize=-2,
         capture_units=FALSE,
         footnote = ""
 ) %>%  del_col(3)

  rtf(  a, file="file.rtf")
spgarbet commented 1 year ago

rtf gets the least love. This should fix it. Let me know if that works. Also, I noticed the lancet style has issues with the centered decimal. Do you see this on your end?

jr1234567 commented 1 year ago

Many thanks. The duplicated "N=" is resolved. Indeed there are other issues: -With the lancet style the duplicated decimal gives a weird "Â" -With the Hmisc style, the "/" is missing, whereas the "*" are duplicated -With the nejm style, the dash is replaced by "0—"

spgarbet commented 1 year ago

These are all UTF+8 character handling. I need to look into what's required for this in RTF. The RTF spec was designed before unicode. I changed the title. Hopefully there's just some escaping required.

spgarbet commented 1 year ago

Found this: https://stackoverflow.com/questions/66275158/specify-utf-8-character-encoding-in-rtf-the-text-in-utf-8-format-is-correctly

spgarbet commented 1 year ago

https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch01.html#unicode_in_rtf

I experimented with this. I can write a transformer I think.

There's also some Markdown not handled properly after the refactor. In particular, bold.

spgarbet commented 1 year ago

I've realigned this with the recent refactor. Should be able to move onto UTF+8 handling next.

spgarbet commented 1 year ago

Try this version.

jr1234567 commented 1 year ago
spgarbet commented 1 year ago

The hmisc "*" missing is intended. It should be bold now.

I'll check on the lancet. That was working on my side. The centered decimal is in the lower code space of UTF-8 which of course requires different handling.

spgarbet commented 1 year ago

I've duplicated the · on a different machine. Trying to solve.

spgarbet commented 1 year ago

It works for me now with all three cases. The only thing I see odd is the left column in 'nejm' is centered instead of left justified.

jr1234567 commented 1 year ago

new tests:

spgarbet commented 1 year ago

Why do you want the "*" rendered? It's a bold directive. I switched it to be consistent with the other styles and made it a bold.

jr1234567 commented 1 year ago

Ah OK ! I misunderstood that,. I only mentioned them for consistency reason as that they do appear as such when viewing a table in Rstudio This is indeed way better not to have them
The / is useful though for numerator / denominator

spgarbet commented 1 year ago

I think this leaves rtf in good shape now.

jr1234567 commented 1 year ago

Everything is beautiful Many thanks for this great job