Closed mardukbp closed 7 years ago
I've thought about this; it's an interesting suggestion.
Some issues:
+++ Marduk Bolaños [Dec 23 14 14:51 ]:
Suppose I want to write a simple document in Markdown, that I want to convert to PDF. However, Pandoc requires LaTeX in order to generate the PDF, whereas groff is bundled with every Linux distro and OS X. I realize that pandoc converts markdown to man page, but groff is capable of much more. It just has an ugly, archaic syntax.
Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/1839
On Wed, Dec 24, 2014 at 1:36 AM, John MacFarlane notifications@github.com wrote:
I've thought about this; it's an interesting suggestion. [....]
- At least a third of pandoc's users are on Windows, I estimate. LateX may be a better route to PDF for them.
If someone would implement a Groff-based path to PDF output, it would be a very welcome addition.
But surely this shouldn't kill the LaTeX road to PDF.
So Marduk's suggestion should never be understood as an "either, or" question. ;-)
+++ Marduk Bolaños [Dec 23 14 14:51 ]:
Suppose I want to write a simple document in Markdown, that I want to convert to PDF. However, Pandoc requires LaTeX in order to generate the PDF, whereas groff is bundled with every Linux distro and OS X. I realize that pandoc converts markdown to man page, but groff is capable of much more. It just has an ugly, archaic syntax.
Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/1839
— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/1839#issuecomment-68014247.
@jgm
- Last I looked, unicode support in groff wasn't good.
I just tried this:
.AUTHOR "Marduk Bolanos
.TITLE "Some math with eqn + groff
.PRINTSTYLE TYPESET
.PAGE 6i 9i .75i .75i .75i .75i
.FAMILY T \" Times Roman
.START
.PP
Laplace's equation
.EQ
∇²ϕ = 0
.EN
Compile it with pdfmom -e -k math.mom > math.pdf
The only issue is using unicode characters for macro arguments, like the ñ that should be in my surname.
I don't think everything that can be represented in LaTeX math has an eqn equivalent
Absolutely, but there could be a lot of potential users that won't need equations.
We'd need to add a writer for an appropriate groff macro package.
groff includes MOM
LateX may be a better route to PDF for them.
Really? I thought it would be easier to install groff. But...(sigh) its Windows.
@KurtPfeifle
Absolutely. I don't mean to replace LaTeX. It just seems overkill to me if all I want is to write e.g. a fairy tale.
I do not have pdfmom
on my (OSX) system. I tried processing your sample with:
groff -mmom my.mom | pstopdf -i -o my.pdf
but the unicode didn't come out right. Are there options I should be using?
PS. Just to throw out a couple of other options I've considered for alternative routes to PDF:
wkhtmltopdf
- no problems with unicode, but math might be trickyI don't have time for the foreseeable future to add a groff equation writer.
I just tried with groff -mom -e -k math.mom | ps2pdf - doc.pdf
in Linux and it works. The -k
flag is crucial for getting the Unicode right.
Which version of groff
are you using? With 1.19.2
which comes bundles on my system, there is no -k
option.
I am using groff 1.22.3 in Arch Linux.
OK, I installed groff 1.22.3 with homebrew, and I now have pdfmom
, which works as you described.
Why are unicode characters not processed correctly as arguments to macros? Is there any workaround for this?
It turns out that they are actually processed correctly. My apologies. I was disoriented by the following message:
math.mom:1: can't translate character code 241 to special character
~n' in transparent throughput`
What would groff offer which latex does not? Not having any math in your text is hardly an argument against latex.
I accidentally hit send too soon. I mean latex has many other qualities. I'll grant that xelatex is slow but I use it all the time because of its linguistic capabilities.
I think the main consideration is that latex is a big, complex system with lots of moving parts that needs to be installed separately; groff is a fast, fairly simple program that is already present on most *nix systems. So this might be a lightweight and fast path to PDFs -- something to supplement the path through latex, not replace it.
There is another path to PDF (and a few other things) that hasn't been mentioned here, though it might be a bit too much for a "quick alternative" and that's DITA with additional bits via D4P. I doubt it will be a quick implementation, though (depedning on how much from XHTML can be adapted. OTOH it would make pandoc pretty much the only dead simple migration path to it and whoever manages it might get nominated for godhood at some point. It does, however, already have OS platform independence and full UTF-8 support.
I should mention that I've recently played around with a direct PDF renderer for CommonMark that doesn't use LaTeX at all -- it uses the libharu C library.
See the jgm/cmarkpdf repository.
Something similar could in principle be done for pandoc using the HPDF library. Some problems, though: HPDF doesn't seem to be actively maintained, and it doesn't seem to support loading of fonts other than the standard PDF ones. Also, supporting everything pandoc supports (including tables and math) with direct PDF rendering would be very difficult.
I'm currently looking into using pandoc with LibreOffice as a means of getting to print/PDF by way of ODT (because pandoc can apply ODT/OTT styles on the command line and LibreOffice headless can't, go figure) as an alternative to all the ever so precious and delicate FOP based "solutions" in DITA and DocBook land. The only ones to work consistently literally cost thousands of dollars ... for PDF?! Ridiculous. So the aim is to convert DITA to ePub 2 (easy, also avoids javascript), then use that as input via pandoc and pick a file for a stylesheet (I test with the über-ridiculous LibreOffice Writer Guide from version 4.0 and once it's an ODT it can either go to PDF, straight to print, stay as it is or head towards DOCX land (obviously the same thing would woth with DOCX files, though I doubt Microsoft provide the equivalent of "/path/to/soffice.bin --headless --convert-to pdf foo.odt"
Another one that might be worth a look, though, is wkhtmltopdf, which uses Qt4.8 and webkit (it's from Google, of course; it takes [X]HTML (including 5) input,applies CSS and generates PDF out. It also includes a wkhtmltoimage which does the same thing to GIF, JPG, PNG and something else I forget which (I use PNG). The image bits are great for side-stepping Twitter character limits.
+++ Ben McGinnes [Dec 20 15 10:29 ]:
Another one that might be worth a look, though, is [1]wkhtmltopdf, which uses Qt4.8 and webkit (it's from Google, of course; it takes [X]HTML (including 5) input,applies CSS and generates PDF out. It also includes a wkhtmltoimage which does the same thing to GIF, JPG, PNG and something else I forget which (I use PNG). The image bits are great for side-stepping [2]Twitter character limits.
Yes, maybe we could make pandoc -t html5 -o doc.pdf
produce a pdf using wkhtmltopdf. That might be a nice
alternative for some people, even though the result doesn't
match latex.
I've added some preliminary wkhtmltopdf support. Seems to work okay, but I want to set some wkhtmltopdf options based on metadata fields (paper size for example).
+++ Ben McGinnes [Dec 20 15 10:29 ]:
I'm currently looking into using pandoc with LibreOffice as a means of getting to print/PDF by way of ODT (because pandoc can apply ODT/OTT styles on the command line and LibreOffice headless can't, go figure) as an alternative to all the ever so precious and delicate FOP based "solutions" in DITA and DocBook land. The only ones to work consistently literally cost thousands of dollars ... for PDF?! Ridiculous. So the aim is to convert DITA to ePub 2 (easy, also avoids javascript), then use that as input via pandoc and pick a file for a stylesheet (I test with the über-ridiculous LibreOffice Writer Guide from version 4.0 and once it's an ODT it can either go to PDF, straight to print, stay as it is or head towards DOCX land (obviously the same thing would woth with DOCX files, though I doubt Microsoft provide the equivalent of "/path/to/soffice.bin --headless --convert-to pdf foo.odt"
Another one that might be worth a look, though, is [1]wkhtmltopdf, which uses Qt4.8 and webkit (it's from Google, of course; it takes [X]HTML (including 5) input,applies CSS and generates PDF out. It also includes a wkhtmltoimage which does the same thing to GIF, JPG, PNG and something else I forget which (I use PNG). The image bits are great for side-stepping [2]Twitter character limits.
— Reply to this email directly or [3]view it on GitHub.
References
Just to throw this out there, another route to typeset PDFs is using the SILE Typesetter. It's still young but already a good deal more flexible than LaTeX and a world lighter.
I've working on a SILE writer for Pandoc (and indeed have it functioning enough to generate press-ready books PDFs from Markdown sources) but the code is pretty rough and only has the markup I've used so far not all the markup others are likely to need. I'll be contributing it at some point—in the mean time anyone interested can contact me.
I have a prototype groff ms writer in a branch. The PDF output is not very good, though, compared to LaTeX. And to support math, we'd need to implement an eqn converter in texmath.
@alerque, any update on your SILE experiment?
@ickc Yes actually I've made progress on it, but it still suffers from the problem that I'm using it for my own projects and it's hard to finish up to the point of being ready to contribute upstream when it meets my needs already. See also this discussion for some more status notes and an example. If you have a use case for it hit me up and lets talk about what it would take to make it work for you. Since we're wandering off topic for this issue maybe opening a dedicated issue for this would work catch me in SILE's gitter room.
In pandoc-discuss, there's a post mentioning a tool slightly related to this: [OT] rinohtype 0.3.0 - an alternative to LaTeX - Google Groups. If you don't mind using pandoc to output to rst first and pipe it through this extra dependency, you can get a PDF. It is a very early alpha(?) though.
The Laplace example worked here with groff -mom -e -k math.mom | ps2pdf - doc.pdf
but not with pdfmom -e -k math.mom > math.pdf
But when I fed it some Thai, it gave warnings like t.groff:21: warning: can't find special character
u0E27'` and it didn't work.
We might consider merging the groff_ms
branch's groff_ms writer even without math support.
[EDIT: I've rebased on master and added ms
branch]
Adding eqn support to texmath would be a fun project for someone!
eqn branch in jgm/texmath is almost finished.
Is it worth using the texmath eqn writer for math in man pages as well? (Arguably no. eqn's ascii output is pretty unreliable. Better to preserve the tex math?)
I have added a groff ms
writer to the master branch.
Not perfect yet, but coming along. It translates math into eqn format, tables into tbl format.
You can now go straight to pdf via ms:
pandoc input.txt -t ms -o output.pdf
This uses pdfroff, which is packaged with recent versions of groff (but not the default groff on macos, which is old).
Thank you very much John! I just tried it and tex_math_dollars works, but tables do not work. I tried simple_tables and pipe_tables and I got the TBL code rendered in the PDF instead of a table.
I am using Ubuntu 16.04, groff 1.22.3.
I forgot to add the -t parameter when calling pdfroff. I'll add that now.
+++ Marduk Bolaños [Mar 23 17 14:57 ]:
Thank you very much John! I just tried it and tex_math_dollars works, but tables do not work. I tried simple_tables and pipe_tables and I got the TBL code rendered in the PDF instead of a table.
— You are receiving this because you modified the open/close state. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.
References
I just did a git pull
and ran stack install --test
. The following test failed:
ms
writer
basic: FAIL (0.09s)
------------------------------------------------------------------------
--- writer.ms
+++ /home/marduk/pandoc-git/.stack-work/dist/x86_64-linux/Cabal-1.24.2.0/build/pandoc/pandoc --quiet --data-dir ../data testsuite.native -r native -w ms --columns=78 --variable pandoc-version= -s
+ 33 .nr PI 2m
- 33 .nr PI 0
------------------------------------------------------------------------
+++ Marduk Bolaños [Mar 25 17 00:25 ]:
I just did a git pulland ran stack install --test. The following test failed:
Try again now (pull again). It should work.
Now stack finishes, but tables still do not work. Using the examples for simple_tables
or pipe_tables
I get the following:
eqn:<standard input>:157: unquoted escape
eqn:<standard input>:157: unquoted escape
<standard input>:137: warning: numeric expression expected (got `\&')
<standard input>:149: warning: numeric expression expected (got `\&')
<standard input>:166: warning: numeric expression expected (got a special character)
<standard input>:182: warning: numeric expression expected (got `\&')
<standard input>:182: warning: numeric expression expected (got `\&')
<standard input>:194: warning: numeric expression expected (got `\&')
<standard input>:203: warning: numeric expression expected (got `\&')
In the PDF I get less columns than there are in the .md file.
@mardukbp are you running groff manually on the ms produced by pandoc?
If so, you need to add -t
to process the tables and -e
if you have any math.
Or you can use pandoc to go straight to PDF via ms:
pandoc -t ms input.md -o output.pdf
This should automatically use the right flags to handle your tables and equations.
If you're already using -t
, can you give the specific examples you're referring to?
It could also be helpful to generate the .ms
file (rather than pdf directly) using pandoc -t ms -s
so we can look at the line numbers mentioned in the messages.
This is what I am doing:
pandoc test.md -f markdown+tex_math_dollars+pipe_tables -t ms -o out.pdf
test.md
# Groff test
$a^2+b^2=c^2$
| Column A | Column B |
|----------|----------|
| 12 | 22 |
And here is the output
Great, can you start a new issue for this? We're likely to lose track if it's a comment on a closed issue.
Seems to be a strange interaction between eqn and tbl.
Either -t
or -e
works fine by itself, but when you
use both it doesn't work.
I think I've isolated the problem.
Consider this groff file:
.EQ
delim ||
.EN
.LP
.EQ
x
.EN
.PP
.TS
tab(@);
l.
T{
A
T}
_
T{
1
T}
.TE
Process this with pdfroff -t -e
and you'll get the errors.
No remove the first three lines; the errors go away.
Pandoc includes those first three lines in any document with math (since we use | delimiters for inline math). Perhaps the | character has a special meaning for tbl?
Indeed:
from man tbl:
| The corresponding column becomes a vertical rule (if two of
these are adjacent, a double vertical rule).
So we need to choose a different character to use for inline math delimiter.
OK, never mind the issue, this is fixed!
Thanks a lot John! I did some tests. Column alignment works, as well as horizontal rules, bold and italics.
Suppose I want to write a simple document in Markdown, that I want to convert to PDF. However, Pandoc requires LaTeX in order to generate the PDF, whereas groff is bundled with every Linux distro and OS X. I realize that pandoc converts markdown to man page, but groff is capable of much more. It just has an ugly, archaic syntax.