jez / pandoc-sidenote

Convert Pandoc Markdown-style footnotes into sidenotes
MIT License
141 stars 17 forks source link

pandoc-sidenote deletes much content inside footnotes #4

Closed gwern closed 7 years ago

gwern commented 7 years ago

While finetuning the CSS for Tufte CSS sidenotes (doesn't play well with blockquote formatting and appears to not cover some basic aspects of lists), I noticed that much of the content in my footnotes had gone missing and took a second look at the source:

    -- lists, blockquotes, headers, hrs, and tables are all omitted
    -- Think they shouldn't be? I'm open to sensible PR's.
    deBlock _ = []

This is, uh, not ideal. pandoc-sidenote should definitely not be deleting everything except simple paragraph text, which breaks a large fraction of all my footnotes, many of which are about including some relevant excerpt or source code fragment or tangential point like that.

I'm not sure what the issue here would be. Is there some reason that things like blockquotes cannot simply be put inside the sidenote <span>?

jez commented 7 years ago

I think that when I tried it, tufte-css didn't render the page correctly. Admittedly, at the first sign of it not working I kind of just gave up, because I didn't anticipate using it myself.

I thought the current behavior was to just unwrap the block and just put the inner text inside the span, but maybe I'm misremembering. I also was trying as hard as possible to work with the existing tufte-css, because interoperability with the upstream tufte-css library was important to me.

Like I said in my message though, if you can figure out an elegant way to handle it, be my guest.

gwern commented 7 years ago

That might be related to the incompleteness of the Tufte CSS implementation, since it only reduces the width of a few elements like <p> tags, which doesn't work too well... In any case, if you don't know how to do it, I don't either, so I guess I'll have to go back to my previous floating footnotes.

jez commented 7 years ago

Out of curiosity, could you post a small-ish example of a Markdown file that you'd like to work?

gwern commented 7 years ago

Well, here's a example exercising some of the missing conversions:

Lorem ipsum[^footnote].

[^footnote]: Discussion of abstruse point with

    > quote from eminent source making the following claims
    >
    > 1. claim
    > 2. claim
    > 3. claim
    >
    > and minor caveats
    >
    > - caveat
    > - caveat
    >
    > > deeper discussion

    which implies

        y = ax + b

    and implemented as

    ~~~{.R}
    y <- function(a,b) { a*x + b }
and we should note that

- note
- note

    extended note

Which gets rendered as:

$ pandoc --filter pandoc-sidenote foo.page

Lorem ipsumDiscussion of abstruse point with

which implies

and implemented as

and we should note that

.

jez commented 7 years ago

Ok so I took a look into this issue again:

Fundamentally, this arrises because I'm solving a problem that should be solved with a custom Pandoc writer by using a Pandoc filter instead. By optimizing for interoperability, I've sacrificed some fidelity. Pandoc footnotes are only allowed to have blocks because they've been special cased as such in the AST representation. With a filter, we're effectively stripping this special case out to prevent the footnotes from being rendered, which poses problems.

In retrospect, handling block-level formatting in sidenotes would have been easier to implement faithfully by creating a custom writer. It would take in the Note elements of the AST and emit the correct markup. Alternatively, a completely separate writer could be avoided by using the html5 writer, but add a flag (though this would require putting tufte-css specific code in Pandoc itself).

The choice to implement this as a filter rather than a writer makes it easier for others to drop this into existing projects. They may rely on features of a custom writer, and the Pandoc HTML5 writer may be updated as time goes on.

Ultimately, using a filter instead of a writer requires us to either: prematurely convert the block-level elements to HTML strings, or strip the block level elements entirely.

I chose the latter, because it's more obvious that something has failed. The behavior is incorrect, and people don't rely incorrect behavior, so in the future it can potentially be fixed.

On the other hand, choosing to prematurely convert the block-level elements to HTML at filter-time is essentially pre-empting the job of the writer. Even if we fetched out to Text.Pandoc.Writers.HTML, Pandoc filters are not allowed to inspect the list of enabled extensions, so we'd have to fall back to the defaults. Even if we correctly guessed the enabled extensions, we'd be clobbering the AST that a filter later in the pipeline might have used.

So, for the problem at hand, we could be doing a better job of extracting the inner text from the block elements, but it would be a crude approximation at best.

lakonis commented 4 years ago

Not wanting to reopen the issue, however I'm wondering if a workaround has been found or proposed ? I guess the issue is raised from time to time ? Thank you @jez

gwern commented 4 years ago

My ultimate solution was to use a custom JavaScript library Said Achmiz wrote for me, which at runtime takes standard Pandoc HTML5 footnotes and lays them out dynamically to fit in the margin if the page is wide enough: https://www.gwern.net/static/js/sidenotes.js (This has an advantage over the standard Tufte-CSS static class approach in that it can handle pages with many very large sidenotes, and as jez points out, it's not obvious how to write a Pandoc filter which generates Tufte-CSS-like static footnotes, while sidenotes.js is drop-in on any Pandoc HTML5 file.)

lakonis commented 4 years ago

Thanks @gwern I will explore your script. I am quite happy with the static feature, but I will try your solution. Wondering whether it will work well with paged.js.

AB1908 commented 3 years ago

I have been scouring the web for decent implementations of footnotes and have run into @gwern's ideas time and time again. Imagine my surprise when I found that blockquotes were eaten up by the filter [1] and an issue had already been raised by him as early as 2016! I was unwilling to work with a JS solution for blockquotes and stubbornly tried to get it working with this filter. My result was the following:

image image image
span.blockquote{
    display:block;
    float:left;
    left:1rem;
    margin-top:-5%;
    margin-bottom:5%;
    clear:both;
    width:90%;
    position:relative;
    border-left:0.5rem solid #ccc;
    padding:0.5rem
}

My CSS knowledge is lacking so I foolishly copied over the styling for sidenotes and added a few tweaks to make it look like a blockquote. Admittedly, this is not the correct solution as we lose the semanticity (?) of actual Markdown (spans vs actual blockquotes) but it does get me where I want to be.

My website is a work in progress and here is a link to the current commit for posterity: https://github.com/AB1908/AB1908.github.io/commit/8f0af6bb9982cd902863c4615f6228a3c59f3b7f

[1]: Admittedly, I'm not that attentive a reader and did gloss over where Gwern pointed that out in his article.

andrewufrank commented 2 years ago

@AB1908 - I think your idea brillant, but I do not fully understand, how you insert the .blockquote mark in the text. Could you explain? My problem is sourceCode, which gets lost as well and I hoped that I could deal with it in a similar way. Thank you for the hint!

AB1908 commented 2 years ago

@andrewufrank The third image with the text in red indicates how I've done this. Pandoc allows you to add a CSS class with the []{} syntax. This is actually an inline span, for reference.

I'm afraid code as a sidenote is much trickier. I sadly don't have the skills to attempt it either.

slotThe commented 1 year ago

@AB1908 @andrewufrank in case you people are still interested in this, I have a non-JS prototype[^1] Sidenote.hs implementation, which—as I now realise—implements the "let's prematurely convert the sidenote to HTML" version that @jez talked about here. It sidesteps the writer issues mentioned by not attempting to be a standalone application, but only a library; i.e., the type signature for usingSidenotes is now WriterOptions -> Pandoc -> Pandoc. It should work for all types of blocks, not just blockquotes; I've written about it here.

Sadly, I don't think that this could easily become a PR to this repo, as it's a very backwards-incompatible change (plus, keeping the standalone application structure would mean we run into the writer issues again).

I also briefly thought about using a custom Writer, but this would have meant reimplementing a good chunk of Hakyll's writer functions, which seemed a bit overcomplicated for what I wanted to achieve.

[^1]: I've completely ignored margin notes and "real" footnotes, as I currently only have a need for sidenote support.

gwern commented 1 year ago

My immediate impression is that your CSS or something needs some work because it is cutoff and triggering horizontal scroll on both FF & Chromium:

xwd-16760499123612621

Besides that, much of your post seems concerned with dealing with Span AST elements and their general unsuitability. It's true that Span nodes are often unsuited for doing anything interesting involving a Block or [Block], but that's precisely what Div is for: it's the Block-level equivalent of the Inline Span. And if you are rendering blocks into an HTML <span> wrapper, that would seem to be bad HTML practice: <span> is defined as being inline and containing inline stuff (which is why Pandoc makes it expressed as an Inline), in contrast to <div> block containers.

slotThe commented 1 year ago

My immediate impression is that your CSS or something needs some work because it is cutoff and triggering horizontal scroll on both FF & Chromium:

Huh, interesting. I tested it with a bunch of setups (even works on my portrait monitor at work), but it may well be that the cutoff when to disable sidenotes needs work. I played around with this a bit just now and pushed a change (hopefully a fix, even!). Could you try again?

Besides that, much of your post seems concerned with dealing with Span AST elements and their general unsuitability. It's true that Span nodes are often unsuited for doing anything interesting involving a Block or [Block], but that's precisely what Div is for: it's the Block-level equivalent of the Inline Span. And if you are rendering blocks into an HTML <span> wrapper, that would seem to be bad HTML practice: <span> is defined as being inline and containing inline stuff (which is why Pandoc makes it expressed as an Inline), in contrast to <div> block containers.

My knowledge of HTML and CSS is pretty much non-existent (all I learned was essentially so I could get sidenotes working). I assumed that people had chosen to use <span>s for a good reason and that changing this out for a <div> would just cause havoc—I was wrong! Changing

<span class="sidenote">
  $body$
</span>

to

<div class="sidenote">
  $body$
</div>

seems to have… just worked? I've only tested this locally, but at least there I don't notice any difference in behaviour.

gwern commented 1 year ago

Looks better.

It will have differences in behavior because right now your HTML is very invalid (you are sticking Block where the types say only Inline is permitted): it renders as well as it does because browsers are being nice and trying to rescue your broken non-type-checking HTML and guessing what you meant. So other browsers or versions may 'break' your page or tools will break on it or changes may break it when browsers can no longer guess what you meant etc.

slotThe commented 1 year ago

After some deliberation, I decided that perhaps a separate module (ensuring we keep backwards compatibility) wouldn't be so bad: see #26