rust-cli / roff-rs

ROFF (man page format) generation library
docs.rs/roff
Apache License 2.0
59 stars 11 forks source link

Apostrophe in contractions is turned into \*(Aq, subsequently swallowed by pandoc #38

Open teythoon opened 9 months ago

teythoon commented 9 months ago

We produce manual pages using roff-rs, then render them as HTML for our web site. I have noticed that apostrophes in contractions and marking of possessive cases area not present in the produced HTML:

$ cat src/main.rs
fn main() {
    let mut r = roff::Roff::new();
    r.text(vec!["I've been a good boy.".into()]);
    println!("{}", r.render());
}
$ cargo run > astropof.1
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/foobr`
$ cat astropof.1
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
I\*(Aqve been a good boy.

$ man ./astropof.1|hd
00000000  49 27 76 65 20 62 65 65  6e 20 61 20 67 6f 6f 64  |I've been a good|
00000010  20 62 6f 79 2e 0a 0a                              | boy...|
00000017
$ pandoc -o astropof.txt astropof.1
$ cat astropof.txt
Ive been a good boy.

Now, I'm not an expert on roff, but one of the manual pages that I consult for advice on writing manual pages says not to use \(aq to escape ordinary apostrophes. https://man7.org/linux/man-pages/man7/groff_man_style.7.html says:

You should not use \(aq for an ordinary apostrophe (as in “can't”)

Through experimentation I discovered that pandoc renders both ' and \(aq just fine.

teythoon commented 9 months ago

To clarify: I think there are two issues here:

I have no idea what to do about either issue, but I wanted to report it.

epage commented 9 months ago

Thanks for the report!

teythoon commented 9 months ago

I just noticed the documentation of Roff::to_roff says:

Without special handling, apostrophes get typeset as right single quotes, including in words like “don’t”. In most situations, such as in manual pages, that’s unwanted.

That comment gets it wrong. In contractions, like "don't", you do want to allow renderers to use fancy glyphs, and in fact that is what rustdoc renders it to in the example:

$ echo -n don’t | hd
00000000  64 6f 6e e2 80 99 74                              |don...t|
00000007

Glyph e2 80 99 is RIGHT SINGLE QUOTATION MARK. Where you don't want that kind of fancy glyphs is code samples, which you expect people to copy and paste and have them work right.

You probably want render or to_writer instead of this method.

In fact, I switched to using this method, and this yields perfect results for me: both Debian's man and pandoc render apostrophes as 27 i.e. APOSTROPHE both in text as well as code blocks.