jgm / peg-markdown

An implementation of markdown in C, using a PEG grammar
Other
696 stars 140 forks source link

Note: this package is unmaintained.

What is this?

This is an implementation of John Gruber's markdown in C. It uses a parsing expression grammar (PEG) to define the syntax. This should allow easy modification and extension. It currently supports output in HTML, LaTeX, ODF, or groff_mm formats, and adding new formats is relatively easy.

It is pretty fast. A 179K text file that takes 5.7 seconds for Markdown.pl (v. 1.0.1) to parse takes less than 0.2 seconds for this markdown. It does, however, use a lot of memory (up to 4M of heap space while parsing the 179K file, and up to 80K for a 4K file). (Note that the memory leaks in earlier versions of this program have now been plugged.)

Both a library and a standalone program are provided.

peg-markdown is written and maintained by John MacFarlane (jgm on github), with significant contributions by Ryan Tomayko (rtomayko). It is released under both the GPL and the MIT license; see LICENSE for details.

Installing

On a linux or unix-based system

This program is written in portable ANSI C. It requires glib2. Most *nix systems will have this installed already. The build system requires GNU make.

The other required dependency, Ian Piumarta's peg/leg PEG parser generator, is included in the source directory. It will be built automatically. (However, it is not as portable as peg-markdown itself, and seems to require gcc.)

To make the 'markdown' executable:

make

(Or, on some systems, gmake.) Then, for usage instructions:

./markdown --help

To run John Gruber's Markdown 1.0.3 test suite:

make test

The test suite will fail on one of the list tests. Here's why. Markdown.pl encloses "item one" in the following list in <p> tags:

1.  item one
    * subitem
    * subitem

2.  item two

3.  item three

peg-markdown does not enclose "item one" in <p> tags unless it has a following blank line. This is consistent with the official markdown syntax description, and lets the author of the document choose whether <p> tags are desired.

Cross-compiling for Windows with MinGW on a linux box

Prerequisites:

Steps:

  1. Create the markdown parser using Linux-compiled leg from peg-0.1.4:

    ./peg-0.1.4/leg markdown_parser.leg >markdown_parser.c

    (Note: The same thing could be accomplished by cross-compiling leg, executing it on Windows, and copying the resulting C file to the Linux cross-compiler host.)

  2. Run the cross compiler with include flag for the Windows glib-2.0 headers: for example,

    /usr/bin/i586-mingw32msvc-cc -c \
    -I/usr/i586-mingw32msvc/include/glib-2.0 \
    -I/usr/i586-mingw32msvc/lib/glib-2.0/include -Wall -O3 -ansi markdown*.c
  3. Link against Windows glib-2.0 headers: for example,

    /usr/bin/i586-mingw32msvc-cc markdown*.o \
    -Wl,-L/usr/i586-mingw32msvc/lib/glib,--dy,--warn-unresolved-symbols,-lglib-2.0 \
    -o markdown.exe

The resulting executable depends on the glib dll file, so be sure to load the glib binary on the Windows host.

Compiling with MinGW on Windows

These directions assume that MinGW is installed in c:\MinGW and glib-2.0 is installed in the MinGW directory hierarchy (with the mingw bin directory in the system path).

Unzip peg-markdown in a temp directory. From the directory with the peg-markdown source, execute:

cd peg-0.1.4
make PKG_CONFIG=c:/path/to/glib/bin/pkg-config.exe

Extensions

peg-markdown supports extensions to standard markdown syntax. These can be turned on using the command line flag -x or --extensions. -x by itself turns on all extensions. Extensions can also be turned on selectively, using individual command-line options. To see the available extensions:

./markdown --help-extensions

The --smart extension provides "smart quotes", dashes, and ellipses.

The --notes extension provides a footnote syntax like that of Pandoc or PHP Markdown Extra.

The --strike extension provides a strike-through syntax like that of Redcarpet. For strike-through support in LaTeX documents the sout macro from the ulem package is used. Add \usepackage[normalem]{ulem} to your document's preamble to load it.

Using the library

The library exports two functions:

GString * markdown_to_g_string(char *text, int extensions, int output_format);
char * markdown_to_string(char *text, int extensions, int output_format);

The only difference between these is that markdown_to_g_string returns a GString (glib's automatically resizable string), while markdown_to_string returns a regular character pointer. The memory allocated for these must be freed by the calling program, using g_string_free() or free().

text is the markdown-formatted text to be converted. Note that tabs will be converted to spaces, using a four-space tab stop. Character encodings are ignored.

extensions is a bit-field specifying which syntax extensions should be used. If extensions is 0, no extensions will be used. If it is 0xFFFFFF, all extensions will be used. To set extensions selectively, use the bitwise & operator and the following constants:

output_format is either HTML_FORMAT, LATEX_FORMAT, ODF_FORMAT, or GROFF_MM_FORMAT.

To use the library, include markdown_lib.h. See markdown.c for an example.

Hacking

It should be pretty easy to modify the program to produce other formats, and to parse syntax extensions. A quick guide:

Acknowledgements

Support for ODF output was added by Fletcher T. Penney.