Open cderv opened 5 months ago
Unfortunately, I don't think we can generally remove whitespace from HTML documents, because it has layout implications. That is, we can't look at
```{=html}
And choose to remove that.
Let me remind the context of this issue, so that we are on the same page.
The discussion at #9681 ended up at your comment https://github.com/quarto-dev/quarto-cli/issues/9681#issuecomment-2117775452
Just sharing this in case another fix would be to trim blank lines around html content inside a rawblock before parsing the HTML in our processing;
This is a good idea.
I opened this issue to track this idea.
So the idea initially shared by me was that when we parse HTML table with a blank like at top or bottom of the raw block , we end up in the AST with
- t: "RawBlock"
format: "html"
text: "\n"
If we don't remove that (for non HTML format), then we rely on the fact that Pandoc writers do the right thing to ignore the RawBlock.
Though, it seems it was not correctly done in powerpoint, and this is what messed up Powerpoint output in #9681
It seems to me that trimming start and end of the HTML content of our parsed raw block was something we could do quite safely. But this is not something to do for all the RawBlock ! This idea here is only for our table parsing processing (handle_raw_html_as_table
) where we do create raw block with the remaining content
In the example above, a new line so
- t: "RawBlock"
format: "html"
text: "\n"
Anyhow, if this is not a good idea, I am happy to close. The issue is fixed for powerpoint. I just opened this following your comment in #9681 to try prevent any other issue like this by us producing this raw html block.
Sorry, I should have been clearer.
I'm trying to say that, by itself, we can't simply remove a rawblock that has a single newline, because it's possible that someone added that rawblock with a newline for a reason (since it can affect HTML output).
In the context of our own processing, I do think it makes sense to not emit a rawblock with a single newline in the process of splitting rawblocks that we know we want to treat.
In other words, the fix shouldn't be to add a separate filter that drops all rawblocks that only have whitespace; the fix is to change the table processing code directly.
In other words, the fix shouldn't be to add a separate filter that drops all rawblocks that only have whitespace; the fix is to change the table processing code directly.
Oh perfect. We are on the same page ! 💯 agree with that, that was my plan.
From #9681, issue was fixed by removing the rawblock - though I understand now why there was a new slides I think
In this parsed HTML tables
And it seems it lead to a new slide in Powerpoint.
Just sharing this in case another fix would be to trim blank lines around html content inside a rawblock before parsing the HTML in our processing;
Originally posted by @cderv in https://github.com/quarto-dev/quarto-cli/issues/9681#issuecomment-2117086665