timtylin / scholdoc

Fork of Pandoc for the implementation of a ScholarlyMarkdown parser
scholdoc.scholarlymarkdown.com
GNU General Public License v2.0
335 stars 16 forks source link

"Undefined control sequence" for \chapter when using `#`-level headers #6

Open eddie-dunn opened 9 years ago

eddie-dunn commented 9 years ago

When trying to generate a document with scholdoc I get this error:

$ scholdoc -t latex -o test.pdf md_docs/draft3.md
! Undefined control sequence.
l.119 \chapter

scholdoc: Error producing PDF from TeX source

The document works fine with pandoc.

If I add the -f markdown flag pdf generation works fine, but figure referencing (the reason I wanted to change to scholdoc in the first place) gets screwed up; the special 'scholdoc' image syntax gets generated like normal markdown.

The problem seems to be headers of the following types

# Title
some text

and

Title
=====
some text

If I strip the document of all instances of H1 type headers, document generation works again. They are a supported format for markdown documents, though, so that is a workaround, not a solution.

timtylin commented 9 years ago

This behaviour is a design choice in the default markdown_scholarly writer. Titles need to be explicitly stated in the metadata like so:

---
title: This is my title
ahtuor: My Name
---

It may seem inconvenient, but I think that is the only consistent way to decide what is the title of a specific document without ambiguity (since you can have more than one H1 elements).

I designed Scholdoc first and foremost for academic journals and books, so right now I'm reserving the level 1 headers exclusively for chapters, while leaving the job of describing the title up to the metadata. In fact, all header levels have strict schematic meaning in Scholdoc when output to LaTeX/PDF. See the header levels section in the documentation for more details.

I really welcome feedback on whether this is sensible behaviour though. An obvious danger of this approach is confusion for people already used to Markdown, so I'm wondering if a more pragmatic design can be reached.

timtylin commented 9 years ago

I do think it's wrong that the PDF output just fails when there's level-1 headers though. This is because by default LaTeX defaults to article document class which doesn't understand Chapters.

eddie-dunn commented 9 years ago

Yes, my main issue with how scholdoc interprets H1 headers is that the whole documentation generation crashed and burned, only giving a cryptic LaTeX error message that I couldn't troubleshoot from the markdown source. How would anyone not proficient with LaTeX know that \chapter is not valid for the document format scholdoc happens to use when generating pdfs?

In order to get to the root of the problem I had to output the document as as .tex file, once generated by scholdoc and once generated by pandoc, and compare the difference. Then I tried creating a dummy markdown file in order to try to determine exactly what it was that triggered the problem. Unfortunately, it wasn't until I inspected the markdown source for the scholdoc documentation that I saw that you had no H1-headings in the document that I could deduce that they were what caused the issue. This is an exercise I doubt most users are willing and/or knowledgeable enough to perform, neither should they be required to.

I don't agree with you that it is a good design decision to disallow H1-headers in markdown documents, since, as I mentioned earlier, they are part of the markdown standard and work fine for pandoc and any other document generator I can recall ever using. If you want to decide the title of a specific document, it should either be determined from the metadata as you suggest, or in the case such metadata is missing, be the first H1-heading encountered in the document.

As I might not have a deep enough understanding of the problem scope, you are free to disagree with me on this of course, but then it I think it should be very explicit, in both documentation and when running scholdoc, that H1-headings will lead to problems when trying to generate pdf files.

timtylin commented 9 years ago

In you opinion, what should H1 and H2 be mapped to in LaTeX?

eddie-dunn commented 9 years ago

If scholdoc generates an article (which lacks \chapter), a H1 should map to \section. At least, that's what pandoc does, as far as I can tell.

timtylin commented 9 years ago

(Just for curiosity's sake: in the documents that you're trying to convert, is H1 used for the whole document's title or for individual section titles?)

Pandoc's model is determined by a check for --chapters in the CLI arguments, as well as a check to see if the documentclass variable is set to book, memoir, or report. If any of these is met, then the entire header level hierarchy is "shifted" up one level. This is great for default behavior, but can get very confusing with the thousands of different custom LaTeX classes that are often used in scientific publishing, and would be very unsustainable for Scholdoc to keep track of which ones support chapters. I consider this the one end of the spectrum of how to handle semantic assignment of the header levels.

A model that contrasts this is something like Markua which gives a very specific and fixed meaning to each header level. This leaves less ambiguity in terms of the behavior but leaves the burden of keeping track what supports what to the author. This is how Scholdoc behaves now, but as I said I'm not entirely happy with this decision.

What do you think about the following model?

  1. By default, Scholdoc will map H1 to \part, which unlike \chapter is recognized by the default article class. As far as I can tell the default font sizes used for \part are similar to \chapter, but \part doesn't cause a page break.
  2. If the Pandoc conditions for chapter support is met (see above), then transform H1 into \chapter
  3. If the document starts with a single H1 element, and is the only H1 throughout the entire document, then that H1 would be treated as the title.
  4. At all times, H2 would still map to \section, H3 to \subsection, etc...

I feel that's a much better system than the current Scholdoc behavior without compromising too much on the semantic consistency of the different header levels when mapping them to publications.

Thanks for filing this feedback, without it I wouldn't be motivated to think about this issue quite as hard!

eddie-dunn commented 9 years ago

(I used H1 headers in several places in the document, but after discovering the problem with scholdoc, I shifted all headings one step up (H1->H2, H2->H3 etc).)

Thanks, I now better understand why you chose to handle H1 the way you did. I like your proposed model, though I did some testing and it seems the default latex behavior is to add "Part I", "Part II" for every \part in the LaTeX source. Is this what we want?

In my case, I think my confusion stemmed from the cognitive dissonance that was a result of my expectation that scholdoc would behave like pandoc (it is a fork of the latter, is it not?), except for when using scholdoc-specific syntax. I.e why would a document that worked fine in pandoc create errors for scholdoc? Further confounding the matter, the only error feedback I got came from LaTeX, referring to the intermediary .latex source, and not the actual source, my markdown document.

What I'm trying to say here is that I am actually fine with giving fixed meanings to each header level, but then scholdoc should be very clear about what the problem is with the markdown source, and not require the LaTeX-specific knowledge that H1 becomes \chapter, and that \chapter doesn't work for the default scholdoc pdf output format article.

eddie-dunn commented 9 years ago

A thought, wouldn't it be possible to reverse the behavior of pandoc? I.e., if an article-class document is generated, do what pandoc does, but if any other class of document is generated, keep the current scholdoc behaviour. In other words, we have only one LaTeX class to keep track of, instead of the thousands of custom classes you refer to.

This means if you use the defaults, doc generation won't crash and burn, which makes it n00b-firendly. On the other hand, if you use custom LaTeX stuff, you hopefully also know enough to keep track of whether H1/\chapter is compatible with your class.

timtylin commented 9 years ago

Sorry for leaving this thread hanging. I'm currently busy with trying to graduate from my PhD program, so I can only allocate time to Scholdoc sporadically.

(I used H1 headers in several places in the document, but after discovering the problem with scholdoc, I shifted all headings one step up (H1->H2, H2->H3 etc).)

I'm just curious, because as you know Markdown is first and foremost shorthand for HTML, and it used to be (in the HTML4 days) considered very important to only have one h1 element per page, so that a proper document tree can be generated by crawlers. This is not so important anymore in the HTML5 age, but it's still considered good practice to have h1 denote a new "article level" topic that can entirely standalone from other parts. LaTeX issues aside, I personally don't think "section" is a better colloquial name for this concept compared to "part" or "chapter".

I like your proposed model, though I did some testing and it seems the default latex behavior is to add "Part I", "Part II" for every \part in the LaTeX source. Is this what we want?

Every single header-level command in LaTeX, including \part, have a star-variant (e.g., \part*) which disables the numbering for that header. This would all be handled by Scholdoc's numbering system (and possible template overrides) anyways, so the important part is just that \part works as a "one level up from Section" that works out of the box in LaTeX regardless of document class.

What I'm trying to say here is that I am actually fine with giving fixed meanings to each header level, but then scholdoc should be very clear about what the problem is with the markdown source, and not require the LaTeX-specific knowledge that H1 becomes \chapter, and that \chapter doesn't work for the default scholdoc pdf output format article.

Scholdoc is just one piece of a puzzle in what I eventually intend to build towards, which is a comprehensive solution for writing academic documents in MD. I absolutely agree with you, and I'm just going to say that Scholdoc is far from the final form that I intend it to be. Eventually Scholdoc should fit into an ecosystem of convention guides and utility that should make this a non-issue.

For now you can see Scholdoc as just a proof-of-concept to see how some of the assumptions I make can fit with existing workflows. It's more for helping me choose design directions than anything else right now. The choice to have fixed section-level semantics is partly due to other system that I intend to build down the line, which is why I chose to do it even though it may not make sense right now; eventually it's just going to be the engine in some other (hopefully better designed) front end.

A thought, wouldn't it be possible to reverse the behavior of pandoc? I.e., if an article-class document is generated, do what pandoc does, but if any other class of document is generated, keep the current scholdoc behaviour.

Isn't this just the "moving header level" paradigm with a different default configuration? It's not gonna be any easier to code. Is there an example of a case where mapping h1 to \section for the general article document class is considered critical? Would it not be enough to just map it to \part* and not throw an error?

expectation that scholdoc would behave like pandoc (it is a fork of the latter, is it not?)

It's a fork in the sense that the program interface and paring engine is similar to Pandoc, but obviously I wouldn't have forked it if I didn't want to experiment with some of the fundamental assumptions of Pandoc behavior. Another example of this kind of difference is that Scholdoc defaults to --standalone mode (which can be turned off with --no-standalone), since I expect if you're calling it from CLI then you'd probably want to produce something you can immediately use; otherwise it would be in a script where you can more easily keep track of arguments.

(if you want 100% compatibility with Pandoc, just rename the executable to pandoc, and most of the Scholdoc additions would be disabled!)

timtylin commented 9 years ago

I'm going to add the \part behavior fix for now just because it's a low hanging fruit.

timtylin commented 9 years ago

I've added the H1 -> Parts mapping for non-book document classes in commit d1a6f37

I think it's also a good idea to pick up a "starting with H1, if it is the only H1 in the document, implies the title" rule. I'm just going to think about the cleanest way to implement it.

eddie-dunn commented 9 years ago

Hey, man, I hear you about being busy. I'm currently trying to finish my own thesis, which is why the issue reported above irked me somewhat :)

I'm happy with the changes you've proposed, good job!

lionel- commented 9 years ago

does anyone actually uses \partwhen writing articles?

I really think H1 headers should translate to \section... It looks weird to have the header hierarchy start at ##, and it is not intuitive. To me, they feel like subsections rather than sections, so I have to think twice each time I see or write a header.

timtylin commented 9 years ago

I really think H1 headers should translate to \section... It looks weird to have the header hierarchy start at ##, and it is not intuitive. To me, they feel like subsections rather than sections, so I have to think twice each time I see or write a header.

OTOH, I also know that several people would put a single # at the beginning of an article and treat that as the title. If you do that then ## tends to look a lot more natural as sections. A more informed decision can definitely be made with more concrete usage data, but for now I don't yet see a clear concensus. I'm working on that, though.

For what it's worth, I try to optimize the syntax so that it looks passable by default in Github's style. To my eyes, the way Github renders H1 looks like titles, and H2 looks like sections. That sort of played into the eventual decision.

does anyone actually uses \partwhen writing articles?

Using \part is more of a hack. It's not semantically correct in most cases but it doesn't error out latex and tends to look somewhat like titles. Sometimes with latex not crashing is the best you can hope for.