jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.95k stars 3.35k forks source link

Can't add working TOC to gfm or commonmark #5665

Closed David-Else closed 5 years ago

David-Else commented 5 years ago

pandoc 2.7.3, downloaded from Github on Fedora Linux

I am trying to make a TOC generated from my .odt file pandoc --toc -s --extract-media . test.odt -t gfm -o test1.md. Non of the links are clickable. After reading a bug report I tried commonmark but the results were the same.

(TOC generated)
  - [<span id="anchor"></span>TypeScript is Fun\! Part 1 – Turn Your
    Game Ideas Into A Well Thought Out
    Blueprint](#typescript-is-fun-part-1-turn-your-game-ideas-into-a-well-thought-out-blueprint)

(Link generated)
## <span id="anchor"></span>TypeScript is Fun\! Part 1 – Turn Your Game Ideas Into A Well Thought Out Blueprint

Here is the whole TOC and some of the links:

- [<span id="anchor"></span>TypeScript is Fun\! Part 1 – Turn Your Game Ideas Into A Well Thought Out Blueprint](#typescript-is-fun-part-1-turn-your-game-ideas-into-a-well-thought-out-blueprint)
- [<span id="anchor-1"></span>Decide in advance as many details as possible about your game](#decide-in-advance-as-many-details-as-possible-about-your-game)
  - [<span id="anchor-2"></span>Game hardware requirements and this course (? will this annoy people)](#game-hardware-requirements-and-this-course-will-this-annoy-people)
- [<span id="anchor-3"></span>Stage one: make some sketches](#stage-one-make-some-sketches)
- [](#section)
- [<span id="anchor-12"></span>Stage two: List all the different things we can see in our sketches](#stage-two-list-all-the-different-things-we-can-see-in-our-sketches)
- [![](./Pictures/10000201000007570000036C40120153FC9309E4.png)](#section-1)
- [<span id="anchor-14"></span>List of things we see on the different screens](#list-of-things-we-see-on-the-different-screens)
  - [<span id="anchor-15"></span>Text](#text)
  - [<span id="anchor-16"></span>Hero](#hero)
  - [<span id="anchor-17"></span>Bullets](#bullets)
  - [<span id="anchor-18"></span>Zombies](#zombies)
  - [<span id="anchor-19"></span>Static obstacles](#static-obstacles)
  - [<span id="anchor-20"></span>What about the things we will need that are not visible?](#what-about-the-things-we-will-need-that-are-not-visible)
  - [<span id="anchor-21"></span>Stage 3: List all the different properties these things will need to have](#stage-3-list-all-the-different-properties-these-things-will-need-to-have)
  - [<span id="anchor-23"></span>Position](#position)
  - [<span id="anchor-24"></span>Rotation](#rotation)
  - [<span id="anchor-25"></span>Scale](#scale)
  - [<span id="anchor-26"></span>Width and height](#width-and-height)
  - [<span id="anchor-28"></span>Velocity](#velocity)
  - [<span id="anchor-29"></span>Lives](#lives)
- [<span id="anchor-31"></span>Stage 4: List all the actions these ‘things’ can take](#stage-4-list-all-the-actions-these-things-can-take)
  - [<span id="anchor-33"></span>Die: delete itself from existence (\!\! make hero delete himself too\!)](#die-delete-itself-from-existence-make-hero-delete-himself-too)
  - [<span id="anchor-34"></span>Update: Change something about themselves internally when told to](#update-change-something-about-themselves-internally-when-told-to)
  - [<span id="anchor-35"></span>Draw: draw itself to the screen](#draw-draw-itself-to-the-screen)

## <span id="anchor"></span>TypeScript is Fun\! Part 1 – Turn Your Game Ideas Into A Well Thought Out Blueprint

## <span id="anchor-1"></span>Decide in advance as many details as possible about your game
### <span id="anchor-2"></span>Game hardware requirements and this course (? will this annoy people)
## <span id="anchor-3"></span>Stage one: make some sketches
<span id="anchor-4"></span>

<span id="anchor-5"></span>PAUSE
<span id="anchor-6"></span>

<span id="anchor-7"></span>

<span id="anchor-8"></span>GAME OVER

I only want links generated for headings, I know this is not tech support, but is this broken? The links seem random, can I tell it to ONLY create a link for a heading? Cheers!

jgm commented 5 years ago

It's impossible to tell what's going on here without being able to look at the ODT file.

David-Else commented 5 years ago

OK. Here it is. Please don't read it, it is a very rough draft with super cringe artwork! :) It was made in Libre Office Version: 6.1.6 test.zip

jgm commented 5 years ago

Just took a brief look. It seems to me that pandoc is correctly identifying the bits of your document with the "Heading 2" style, but you've got some blank Heading 2s, and in at least one case a picture has a Heading 2 style -- so all these things are interpreted as section headings.

David-Else commented 5 years ago

@jgm I am looking at the original .odt and I don't understand what a "blank Heading 2" is? There is:

 <span id="anchor-5"></span>PAUSE

![](./Pictures/100002010000078000000438DAFF2A69E18AC389.png)

<span id="anchor-6"></span>

<span id="anchor-7"></span>

<span id="anchor-8"></span>GAME OVER

Where are <span id="anchor-6"></span> <span id="anchor-7"></span> coming from? They are not in the .odt as far as I can see? I don't want any spans other than for headings in the TOC. The GAME OVER is type 'text body', no idea why that has a span added.

Also, are the links click-able and working for you in the TOC? They don't do anything in VS Code.

Cheers!

mb21 commented 5 years ago

What we do to debug the odt file:

unzip -d test test.odt
xmllint --format test/content.xml | grep 'text:h '

This lists all the headings in your doc and some of them don't contain any text:

      <text:h text:style-name="P34" text:outline-level="2"><text:bookmark-start text:name="__RefHeading___Toc624_1902081235"/>TypeScript is Fun! Part 1 – Turn Your Game Ideas Into A Well Thought Out Blueprint<text:bookmark-end text:name="__RefHeading___Toc624_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_2" text:outline-level="2"><text:bookmark-start text:name="__RefHeading___Toc626_1902081235"/>Decide in advance as many details as possible about your game<text:bookmark-end text:name="__RefHeading___Toc626_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc628_1902081235"/><text:span text:style-name="T22">Game hardware requirements</text:span> and this course <text:span text:style-name="T23">(? will this annoy people)</text:span><text:bookmark-end text:name="__RefHeading___Toc628_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_2" text:outline-level="2"><text:bookmark-start text:name="__RefHeading___Toc630_1902081235"/>Stage one: make <text:span text:style-name="T6">some</text:span> sketch<text:span text:style-name="T6">es</text:span><text:bookmark-end text:name="__RefHeading___Toc630_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_2" text:outline-level="2"/>
      <text:h text:style-name="Heading_20_2" text:outline-level="2"><text:bookmark-start text:name="__RefHeading___Toc632_1902081235"/>Stage two: List all the di<text:span text:style-name="T8">fferent things we can see in our sketches</text:span><text:bookmark-end text:name="__RefHeading___Toc632_1902081235"/></text:h>
      <text:h text:style-name="P35" text:outline-level="2">
      <text:h text:style-name="P36" text:outline-level="2"><text:bookmark-start text:name="__RefHeading___Toc634_1902081235"/>List of things we see on the different screens<text:bookmark-end text:name="__RefHeading___Toc634_1902081235"/></text:h>
      <text:h text:style-name="P37" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc636_1902081235"/>Text<text:bookmark-end text:name="__RefHeading___Toc636_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc638_1902081235"/>Hero<text:bookmark-end text:name="__RefHeading___Toc638_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc640_1902081235"/>Bullets<text:bookmark-end text:name="__RefHeading___Toc640_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc642_1902081235"/>Zombies<text:bookmark-end text:name="__RefHeading___Toc642_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc644_1902081235"/>Static <text:span text:style-name="T7">obstacles</text:span><text:bookmark-end text:name="__RefHeading___Toc644_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc646_1902081235"/>What about the things we will need that are not visible?<text:bookmark-end text:name="__RefHeading___Toc646_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc648_1902081235"/>Stage 3: <text:span text:style-name="T9">List</text:span> all the different properties these things will need to have<text:bookmark-end text:name="__RefHeading___Toc648_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc650_1902081235"/><text:soft-page-break/>Position<text:bookmark-end text:name="__RefHeading___Toc650_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc652_1902081235"/>Rotation<text:bookmark-end text:name="__RefHeading___Toc652_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc654_1902081235"/>Scale<text:bookmark-end text:name="__RefHeading___Toc654_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc656_1902081235"/>Width and height<text:bookmark-end text:name="__RefHeading___Toc656_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc658_1902081235"/><text:soft-page-break/>Velocity<text:bookmark-end text:name="__RefHeading___Toc658_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc660_1902081235"/>Lives<text:bookmark-end text:name="__RefHeading___Toc660_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_2" text:outline-level="2"><text:bookmark-start text:name="__RefHeading___Toc662_1902081235"/>Stage 4: List all the actions these ‘things’ can take<text:bookmark-end text:name="__RefHeading___Toc662_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc664_1902081235"/>Die: <text:span text:style-name="T14">delete itself from existence (!! make hero delete himself too!)</text:span><text:bookmark-end text:name="__RefHeading___Toc664_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc666_1902081235"/><text:soft-page-break/><text:span text:style-name="T14">Update: </text:span>Change something about themselves <text:span text:style-name="T13">internally when told to</text:span><text:bookmark-end text:name="__RefHeading___Toc666_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_3" text:outline-level="3"><text:bookmark-start text:name="__RefHeading___Toc668_1902081235"/>Draw: <text:span text:style-name="T14">draw</text:span> itself to the screen<text:bookmark-end text:name="__RefHeading___Toc668_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_4" text:outline-level="4"><text:bookmark-start text:name="__RefHeading___Toc670_1902081235"/>StartScreen<text:bookmark-end text:name="__RefHeading___Toc670_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_4" text:outline-level="4"><text:bookmark-start text:name="__RefHeading___Toc672_1902081235"/>LevelOne<text:bookmark-end text:name="__RefHeading___Toc672_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_4" text:outline-level="4"><text:bookmark-start text:name="__RefHeading___Toc674_1902081235"/>LevelTwo<text:bookmark-end text:name="__RefHeading___Toc674_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_4" text:outline-level="4"><text:bookmark-start text:name="__RefHeading___Toc676_1902081235"/>Pause<text:bookmark-end text:name="__RefHeading___Toc676_1902081235"/></text:h>
      <text:h text:style-name="Heading_20_4" text:outline-level="4"><text:bookmark-start text:name="__RefHeading___Toc1230_1902081235"/>GameOver<text:bookmark-end text:name="__RefHeading___Toc1230_1902081235"/></text:h>
      <text:h text:style-name="P38" text:outline-level="4"><text:bookmark-start text:name="__RefHeading___Toc680_1902081235"/>HighScores<text:bookmark-end text:name="__RefHeading___Toc680_1902081235"/></text:h>
David-Else commented 5 years ago

@mb21 Cheers! I recreated your xmllint test, and got the same results. I tried a second more simple test document and things worked great. I attached it simple-test.zip pandoc --toc -s --extract-media . simple-test.odt -t gfm -o simple-test1.md

Which results in:


  - [This is a h1](#this-is-a-h1)
      - [This is a h2](#this-is-a-h2)
          - [This is a h3](#this-is-a-h3)

# This is a h1

This is some text below h1

## This is a h2

This is some text below h2

### This is a h3

This is some text below h3

And the results are now prefect. Note that now there are no <span id="anchor"></span> inserted in the headers, or empty headers or any other unwanted nonsense, and all the TOC entries are click-able and take you to the header.

I am wondering if there is an error in the VS Code markdown rendering in the TOC clicking to go to the heading in the first test.odt file? Even though the markdown looks aweful due to the <span id="anchor-xx"></span> everywhere is looks like it should work?

To sum up, how can I filter out empty headings, <span id="anchor-xx"></span> everywhere that seems to serve no purpose? Is this something I need to do in LibreOffice, or a command in pandoc? Maybe I need to convert to an interim format? Cheers!

mb21 commented 5 years ago

The simplest solution is usually to clean up the source document (e.g. in LibreOffice). You can also write a pandoc filter to filter out certain elements.

(There is also the empty_paragraphs extension but it only works on paragraphs and the odt reader currently doesn't support it.)

Closing this issue now, feel free to ask further question on the pandoc-discuss mailing list.