max-heller / mdbook-pandoc

A mdbook backend powered by Pandoc.
Apache License 2.0
97 stars 7 forks source link

Option to disable GfmAutoIdentifiers #97

Closed deerchao closed 2 months ago

deerchao commented 3 months ago

Extension GfmAutoIdentifiers is enabled default, it would be nice to be able to disable it throught config.

I'm asking because when converting to docx, there would be many bookmarks generated if GfmAutoIdentifiers enabled, and this would cause some problem by my tool to convert from docx to pdf.

I found a work around in 3476 by adding from:

from = "markdown-auto_identifiers"

but it seems to break something else.

max-heller commented 2 months ago

there would be many bookmarks generated

mdbook-pandoc attempts to mirror the structure of mdBooks in this regard, meaning there will end up being bookmarks for each chapter.

this would cause some problem by my tool to convert from docx to pdf

is there a reason you're converting from md -> docx -> pdf instead of md -> pdf directly?

I found a work around in 3476 by adding from:

from = "markdown-auto_identifiers"

but it seems to break something else.

could you elaborate on what seems to be broken?

deerchao commented 2 months ago

Thanks for your wonderful work and quick response.

mdbook-pandoc attempts to mirror the structure of mdBooks in this regard, meaning there will end up being bookmarks for each chapter.

Maybe I missed something, but in my attempts disabling the extension, there were both chapters and bookmarks in docx(In Microsoft Word, they appear in different places), and those got merged into bookmarks in pdf, which caused my problem as I only need the chapters part in pdf.

is there a reason you're converting from md -> docx -> pdf instead of md -> pdf directly?

It's complicated to convert directly to pdf. I have to install a bunch of stuff, like MikTex, and MikTex needs to download a lot of things to work(maybe it's harder because my book contains CJK characters). Frankly I never succeeded in two days' attempts. Even if I did, it would be hard to share the setup in my team: I can simply put mdbook.exe, mdbook-pandoc.exe in my git repository, but not MikTex, especially many team members are not allowed to access internet in daily job.

In the other hand, markdown to docx worked imediately, and there are lots of tools to convert from docx to pdf. Although there are still imperfect spots, such as raw html in markdown got removed in docx, but i'm gussing it would be the same to let pandoc convert to pdf directly.

could you elaborate on what seems to be broken?

If I don't set from, the actual --from generated by mdbook-pandoc would be like commonmask+pipes+attributes+...+gfm_auto_identifiers in my observation; If I set from to markdown-auto_identifiers, it's passed directly to pandoc, so I would lose pipes, attributes and other extensions, making the converted result different. I can just set from to the complete form, but then I'm responsible to modify it if mdbook and mdbook-pandoc ever changed their mind in future versions.

In short, my intend is to remove auto_identifiers extension, not to set from parameter. In the ideal world, there should be a way to epxress this intend.

max-heller commented 2 months ago

Thanks for your wonderful work and quick response.

Thanks for the kind words!

Maybe I missed something, but in my attempts disabling the extension, there were both chapters and bookmarks in docx(In Microsoft Word, they appear in different places), and those got merged into bookmarks in pdf, which caused my problem as I only need the chapters part in pdf.

I see the same behavior. I'm not sure why Pandoc does this, maybe it'd be worth opening an issue there, but it seems unlikely they'll change it since it could break other users.

It's complicated to convert directly to pdf. I have to install a bunch of stuff, like MikTex, and MikTex needs to download a lot of things to work(maybe it's harder because my book contains CJK characters). Frankly I never succeeded in two days' attempts. Even if I did, it would be hard to share the setup in my team: I can simply put mdbook.exe, mdbook-pandoc.exe in my git repository, but not MikTex, especially many team members are not allowed to access internet in daily job.

In the other hand, markdown to docx worked imediately, and there are lots of tools to convert from docx to pdf. Although there are still imperfect spots, such as raw html in markdown got removed in docx, but i'm gussing it would be the same to let pandoc convert to pdf directly.

Understandable, LaTeX can be a lot, especially with CJK. I haven't tried them personally and I have no idea how they are in terms of CJK support, but Pandoc supports several other PDF renderers that may be worth checking out.

If I don't set from, the actual --from generated by mdbook-pandoc would be like commonmask+pipes+attributes+...+gfm_auto_identifiers in my observation; If I set from to markdown-auto_identifiers, it's passed directly to pandoc, so I would lose pipes, attributes and other extensions, making the converted result different. I can just set from to the complete form, but then I'm responsible to modify it if mdbook and mdbook-pandoc ever changed their mind in future versions.

In short, my intend is to remove auto_identifiers extension, not to set from parameter. In the ideal world, there should be a way to epxress this intend.

Can you test #98 to see if it works for your use-case? I've made mdbook-pandoc respect the from option if set, adding on additional extensions that don't conflict with explicitly configured ones when needed. You should be able to set from = commonmark-gfm_auto_identifiers and have things work.

deerchao commented 2 months ago

It works just like expected. Thanks!