jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.74k stars 3.39k forks source link

Groff ms writer #1839

Closed mardukbp closed 7 years ago

mardukbp commented 9 years ago

Suppose I want to write a simple document in Markdown, that I want to convert to PDF. However, Pandoc requires LaTeX in order to generate the PDF, whereas groff is bundled with every Linux distro and OS X. I realize that pandoc converts markdown to man page, but groff is capable of much more. It just has an ugly, archaic syntax.

jgm commented 9 years ago

I've thought about this; it's an interesting suggestion.

Some issues:

+++ Marduk Bolaños [Dec 23 14 14:51 ]:

Suppose I want to write a simple document in Markdown, that I want to convert to PDF. However, Pandoc requires LaTeX in order to generate the PDF, whereas groff is bundled with every Linux distro and OS X. I realize that pandoc converts markdown to man page, but groff is capable of much more. It just has an ugly, archaic syntax.


Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/1839

KurtPfeifle commented 9 years ago

On Wed, Dec 24, 2014 at 1:36 AM, John MacFarlane notifications@github.com wrote:

I've thought about this; it's an interesting suggestion. [....]

  • At least a third of pandoc's users are on Windows, I estimate. LateX may be a better route to PDF for them.

If someone would implement a Groff-based path to PDF output, it would be a very welcome addition.

But surely this shouldn't kill the LaTeX road to PDF.

So Marduk's suggestion should never be understood as an "either, or" question. ;-)

+++ Marduk Bolaños [Dec 23 14 14:51 ]:

Suppose I want to write a simple document in Markdown, that I want to convert to PDF. However, Pandoc requires LaTeX in order to generate the PDF, whereas groff is bundled with every Linux distro and OS X. I realize that pandoc converts markdown to man page, but groff is capable of much more. It just has an ugly, archaic syntax.


Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/1839

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/1839#issuecomment-68014247.

mardukbp commented 9 years ago

@jgm

  • Last I looked, unicode support in groff wasn't good.

I just tried this:

.AUTHOR "Marduk Bolanos
.TITLE "Some math with eqn + groff
.PRINTSTYLE TYPESET
.PAGE 6i 9i .75i .75i .75i .75i
.FAMILY T \" Times Roman
.START
.PP
Laplace's equation
.EQ
∇²ϕ = 0
.EN

Compile it with pdfmom -e -k math.mom > math.pdf

The only issue is using unicode characters for macro arguments, like the ñ that should be in my surname.

I don't think everything that can be represented in LaTeX math has an eqn equivalent

Absolutely, but there could be a lot of potential users that won't need equations.

We'd need to add a writer for an appropriate groff macro package.

groff includes MOM

LateX may be a better route to PDF for them.

Really? I thought it would be easier to install groff. But...(sigh) its Windows.

@KurtPfeifle

Absolutely. I don't mean to replace LaTeX. It just seems overkill to me if all I want is to write e.g. a fairy tale.

jgm commented 9 years ago

I do not have pdfmom on my (OSX) system. I tried processing your sample with:

groff -mmom my.mom | pstopdf -i -o my.pdf

but the unicode didn't come out right. Are there options I should be using? screen shot 2014-12-26 at 10 12 59 am

jgm commented 9 years ago

PS. Just to throw out a couple of other options I've considered for alternative routes to PDF:

mpickering commented 9 years ago

I don't have time for the foreseeable future to add a groff equation writer.

mardukbp commented 9 years ago

I just tried with groff -mom -e -k math.mom | ps2pdf - doc.pdf in Linux and it works. The -k flag is crucial for getting the Unicode right.

mpickering commented 9 years ago

Which version of groff are you using? With 1.19.2 which comes bundles on my system, there is no -k option.

mardukbp commented 9 years ago

I am using groff 1.22.3 in Arch Linux.

jgm commented 9 years ago

OK, I installed groff 1.22.3 with homebrew, and I now have pdfmom, which works as you described. Why are unicode characters not processed correctly as arguments to macros? Is there any workaround for this?

mardukbp commented 9 years ago

It turns out that they are actually processed correctly. My apologies. I was disoriented by the following message:

math.mom:1: can't translate character code 241 to special character ~n' in transparent throughput`

bpj commented 9 years ago

What would groff offer which latex does not? Not having any math in your text is hardly an argument against latex.

bpj commented 9 years ago

I accidentally hit send too soon. I mean latex has many other qualities. I'll grant that xelatex is slow but I use it all the time because of its linguistic capabilities.

jgm commented 9 years ago

I think the main consideration is that latex is a big, complex system with lots of moving parts that needs to be installed separately; groff is a fast, fairly simple program that is already present on most *nix systems. So this might be a lightweight and fast path to PDFs -- something to supplement the path through latex, not replace it.

Hasimir commented 9 years ago

There is another path to PDF (and a few other things) that hasn't been mentioned here, though it might be a bit too much for a "quick alternative" and that's DITA with additional bits via D4P. I doubt it will be a quick implementation, though (depedning on how much from XHTML can be adapted. OTOH it would make pandoc pretty much the only dead simple migration path to it and whoever manages it might get nominated for godhood at some point. It does, however, already have OS platform independence and full UTF-8 support.

jgm commented 9 years ago

I should mention that I've recently played around with a direct PDF renderer for CommonMark that doesn't use LaTeX at all -- it uses the libharu C library.

See the jgm/cmarkpdf repository.

Something similar could in principle be done for pandoc using the HPDF library. Some problems, though: HPDF doesn't seem to be actively maintained, and it doesn't seem to support loading of fonts other than the standard PDF ones. Also, supporting everything pandoc supports (including tables and math) with direct PDF rendering would be very difficult.

Hasimir commented 8 years ago

I'm currently looking into using pandoc with LibreOffice as a means of getting to print/PDF by way of ODT (because pandoc can apply ODT/OTT styles on the command line and LibreOffice headless can't, go figure) as an alternative to all the ever so precious and delicate FOP based "solutions" in DITA and DocBook land. The only ones to work consistently literally cost thousands of dollars ... for PDF?! Ridiculous. So the aim is to convert DITA to ePub 2 (easy, also avoids javascript), then use that as input via pandoc and pick a file for a stylesheet (I test with the über-ridiculous LibreOffice Writer Guide from version 4.0 and once it's an ODT it can either go to PDF, straight to print, stay as it is or head towards DOCX land (obviously the same thing would woth with DOCX files, though I doubt Microsoft provide the equivalent of "/path/to/soffice.bin --headless --convert-to pdf foo.odt"

Another one that might be worth a look, though, is wkhtmltopdf, which uses Qt4.8 and webkit (it's from Google, of course; it takes [X]HTML (including 5) input,applies CSS and generates PDF out. It also includes a wkhtmltoimage which does the same thing to GIF, JPG, PNG and something else I forget which (I use PNG). The image bits are great for side-stepping Twitter character limits.

jgm commented 8 years ago

+++ Ben McGinnes [Dec 20 15 10:29 ]:

Another one that might be worth a look, though, is [1]wkhtmltopdf, which uses Qt4.8 and webkit (it's from Google, of course; it takes [X]HTML (including 5) input,applies CSS and generates PDF out. It also includes a wkhtmltoimage which does the same thing to GIF, JPG, PNG and something else I forget which (I use PNG). The image bits are great for side-stepping [2]Twitter character limits.

Yes, maybe we could make pandoc -t html5 -o doc.pdf produce a pdf using wkhtmltopdf. That might be a nice alternative for some people, even though the result doesn't match latex.

jgm commented 8 years ago

I've added some preliminary wkhtmltopdf support. Seems to work okay, but I want to set some wkhtmltopdf options based on metadata fields (paper size for example).

+++ Ben McGinnes [Dec 20 15 10:29 ]:

I'm currently looking into using pandoc with LibreOffice as a means of getting to print/PDF by way of ODT (because pandoc can apply ODT/OTT styles on the command line and LibreOffice headless can't, go figure) as an alternative to all the ever so precious and delicate FOP based "solutions" in DITA and DocBook land. The only ones to work consistently literally cost thousands of dollars ... for PDF?! Ridiculous. So the aim is to convert DITA to ePub 2 (easy, also avoids javascript), then use that as input via pandoc and pick a file for a stylesheet (I test with the über-ridiculous LibreOffice Writer Guide from version 4.0 and once it's an ODT it can either go to PDF, straight to print, stay as it is or head towards DOCX land (obviously the same thing would woth with DOCX files, though I doubt Microsoft provide the equivalent of "/path/to/soffice.bin --headless --convert-to pdf foo.odt"

Another one that might be worth a look, though, is [1]wkhtmltopdf, which uses Qt4.8 and webkit (it's from Google, of course; it takes [X]HTML (including 5) input,applies CSS and generates PDF out. It also includes a wkhtmltoimage which does the same thing to GIF, JPG, PNG and something else I forget which (I use PNG). The image bits are great for side-stepping [2]Twitter character limits.

— Reply to this email directly or [3]view it on GitHub.

References

  1. http://wkhtmltopdf.org/
  2. https://www.adversary.org/wp/2015/10/03/so-you-want-to-tweet-longer/
  3. https://github.com/jgm/pandoc/issues/1839#issuecomment-166145391
alerque commented 8 years ago

Just to throw this out there, another route to typeset PDFs is using the SILE Typesetter. It's still young but already a good deal more flexible than LaTeX and a world lighter.

I've working on a SILE writer for Pandoc (and indeed have it functioning enough to generate press-ready books PDFs from Markdown sources) but the code is pretty rough and only has the markup I've used so far not all the markup others are likely to need. I'll be contributing it at some point—in the mean time anyone interested can contact me.

jgm commented 8 years ago

I have a prototype groff ms writer in a branch. The PDF output is not very good, though, compared to LaTeX. And to support math, we'd need to implement an eqn converter in texmath.

ickc commented 7 years ago

@alerque, any update on your SILE experiment?

alerque commented 7 years ago

@ickc Yes actually I've made progress on it, but it still suffers from the problem that I'm using it for my own projects and it's hard to finish up to the point of being ready to contribute upstream when it meets my needs already. See also this discussion for some more status notes and an example. If you have a use case for it hit me up and lets talk about what it would take to make it work for you. Since we're wandering off topic for this issue maybe opening a dedicated issue for this would work catch me in SILE's gitter room.

ickc commented 7 years ago

In pandoc-discuss, there's a post mentioning a tool slightly related to this: [OT] rinohtype 0.3.0 - an alternative to LaTeX - Google Groups. If you don't mind using pandoc to output to rst first and pipe it through this extra dependency, you can get a PDF. It is a very early alpha(?) though.

pepa65 commented 7 years ago

The Laplace example worked here with groff -mom -e -k math.mom | ps2pdf - doc.pdf but not with pdfmom -e -k math.mom > math.pdf But when I fed it some Thai, it gave warnings like t.groff:21: warning: can't find special characteru0E27'` and it didn't work.

jgm commented 7 years ago

We might consider merging the groff_ms branch's groff_ms writer even without math support. [EDIT: I've rebased on master and added ms branch]

Adding eqn support to texmath would be a fun project for someone!

jgm commented 7 years ago

eqn branch in jgm/texmath is almost finished.

jgm commented 7 years ago

Is it worth using the texmath eqn writer for math in man pages as well? (Arguably no. eqn's ascii output is pretty unreliable. Better to preserve the tex math?)

jgm commented 7 years ago

I have added a groff ms writer to the master branch. Not perfect yet, but coming along. It translates math into eqn format, tables into tbl format.

jgm commented 7 years ago

You can now go straight to pdf via ms:

pandoc input.txt -t ms -o output.pdf

This uses pdfroff, which is packaged with recent versions of groff (but not the default groff on macos, which is old).

mardukbp commented 7 years ago

Thank you very much John! I just tried it and tex_math_dollars works, but tables do not work. I tried simple_tables and pipe_tables and I got the TBL code rendered in the PDF instead of a table.

I am using Ubuntu 16.04, groff 1.22.3.

jgm commented 7 years ago

I forgot to add the -t parameter when calling pdfroff. I'll add that now.

+++ Marduk Bolaños [Mar 23 17 14:57 ]:

Thank you very much John! I just tried it and tex_math_dollars works, but tables do not work. I tried simple_tables and pipe_tables and I got the TBL code rendered in the PDF instead of a table.

— You are receiving this because you modified the open/close state. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/jgm/pandoc/issues/1839#issuecomment-288873204
  2. https://github.com/notifications/unsubscribe-auth/AAAL5GNLvapfe0A0ouhSXGedtgMa06hqks5rourXgaJpZM4DLyws
mardukbp commented 7 years ago

I just did a git pulland ran stack install --test. The following test failed:

 ms
      writer
        basic:                                                     FAIL (0.09s)

          ------------------------------------------------------------------------
          --- writer.ms
          +++ /home/marduk/pandoc-git/.stack-work/dist/x86_64-linux/Cabal-1.24.2.0/build/pandoc/pandoc --quiet --data-dir ../data testsuite.native -r native -w ms --columns=78 --variable pandoc-version= -s
          +  33 .nr PI 2m
          -  33 .nr PI 0
          ------------------------------------------------------------------------
jgm commented 7 years ago

+++ Marduk Bolaños [Mar 25 17 00:25 ]:

I just did a git pulland ran stack install --test. The following test failed:

Try again now (pull again). It should work.

mardukbp commented 7 years ago

Now stack finishes, but tables still do not work. Using the examples for simple_tables or pipe_tables I get the following:

eqn:<standard input>:157: unquoted escape
eqn:<standard input>:157: unquoted escape
<standard input>:137: warning: numeric expression expected (got `\&')
<standard input>:149: warning: numeric expression expected (got `\&')
<standard input>:166: warning: numeric expression expected (got a special character)
<standard input>:182: warning: numeric expression expected (got `\&')
<standard input>:182: warning: numeric expression expected (got `\&')
<standard input>:194: warning: numeric expression expected (got `\&')
<standard input>:203: warning: numeric expression expected (got `\&')

In the PDF I get less columns than there are in the .md file.

jgm commented 7 years ago

@mardukbp are you running groff manually on the ms produced by pandoc? If so, you need to add -t to process the tables and -e if you have any math. Or you can use pandoc to go straight to PDF via ms:

pandoc -t ms input.md -o output.pdf

This should automatically use the right flags to handle your tables and equations.

If you're already using -t, can you give the specific examples you're referring to?

It could also be helpful to generate the .ms file (rather than pdf directly) using pandoc -t ms -s so we can look at the line numbers mentioned in the messages.

mardukbp commented 7 years ago

This is what I am doing:

pandoc test.md -f markdown+tex_math_dollars+pipe_tables -t ms -o out.pdf

test.md

# Groff test

$a^2+b^2=c^2$

| Column A | Column B |
|----------|----------|
|   12     |    22    |

And here is the output

pandoc groff

jgm commented 7 years ago

Great, can you start a new issue for this? We're likely to lose track if it's a comment on a closed issue.

jgm commented 7 years ago

Seems to be a strange interaction between eqn and tbl. Either -t or -e works fine by itself, but when you use both it doesn't work.

jgm commented 7 years ago

I think I've isolated the problem.

Consider this groff file:

.EQ
delim ||
.EN
.LP
.EQ
x
.EN
.PP
.TS
tab(@);
l.
T{
A
T}
_
T{
1
T}
.TE

Process this with pdfroff -t -e and you'll get the errors. No remove the first three lines; the errors go away.

Pandoc includes those first three lines in any document with math (since we use | delimiters for inline math). Perhaps the | character has a special meaning for tbl?

jgm commented 7 years ago

Indeed:

from man tbl:

   |      The corresponding column becomes a  vertical  rule  (if  two  of
          these are adjacent, a double vertical rule).

So we need to choose a different character to use for inline math delimiter.

jgm commented 7 years ago

OK, never mind the issue, this is fixed!

mardukbp commented 7 years ago

Thanks a lot John! I did some tests. Column alignment works, as well as horizontal rules, bold and italics.