py-pdf / fpdf2

Simple PDF generation for Python
https://py-pdf.github.io/fpdf2/
GNU Lesser General Public License v3.0
1.09k stars 247 forks source link

write_html: support <sup> & <sup> tags inside <table> #860

Open Tolker-KU opened 1 year ago

Tolker-KU commented 1 year ago

Hi,

Thanks for all the great work going into this project!

I wonder if you have considered supporting subscript/superscript in cell/multicell when styling text with markdown?

Github supports this in their markdown implementation using the HTML tags \<sub>/\<sup>. I imagine fpdf2 could do something similar.

If you think this is a good idea, I would be happy to take a crack at it. It seems that the machinery for this feature already is in place.

Lucas-C commented 1 year ago

Hi @Tolker-KU!

Thank you for your nice words 😊

I think this was implemented by @gmischler in https://github.com/PyFPDF/fpdf2/pull/520: https://pyfpdf.github.io/fpdf2/TextStyling.html#subscript-superscript-and-fractional-numbers

I think it should work for multi_cell(), but we currently only have unit tests for .write(), so extra unit tests covering multi_cell() would be welcome!

gmischler commented 1 year ago

I think this was implemented by @gmischler in #520:

520 implements the general ability to render subscript and superscript text, as well as the <sub> and <sup> tags for write_html().

However, the feature is not currently supported by our version of markdown.

The reason for the latter was that I couldn't find a standard on which characters to use as markup. The most popular markdown variant commonmark doesn't support them either, for reasons that aren't entirely clear. But then, since our own markdown variant is rather weird anyway (fundamentally incompatible with any others), we could theoretically chose whatever we want... I've seen ^x^ and ~x~ suggested most often, in our case it would probably make sense to double them like ^^x^^ and ~~x~~ to match the style of the existing tags.

I'm not very comfortable with borrowing tags from HTML. Why not just use HTML in the first place then? Github accepting <sub> and <sup> HTML tags has little to do with markdown. It simply passes those through to the browser unchanged, just as it does with <b>, <i>, etc.

And while we're on the topic: Adding a conforming commonmark implementation (possibly in parallel) should probably be the long term goal.

Tolker-KU commented 1 year ago

Thank for getting back this quickly.

I'm looking for a feature to render subscripts and superscript within cells. As far as I can figure out this is not quite achievable with .write_html. Or am I wrong here?

What do you about adding the ^^ and ~~ tags to the markdown syntax, so one can do .cell(txt="H~~2~~O") -> H2O or .cell(text="E=MC^^2^^") -> E = MC2?

Lucas-C commented 1 year ago

I'm looking for a feature to render subscripts and superscript within cells. As far as I can figure out this is not quite achievable with .write_html. Or am I wrong here?

No, you are right. fpdf2 currently does not support <sup> & <sup> tags inside <table>:

from fpdf import FPDF

pdf = FPDF()
pdf.set_font("Helvetica")
pdf.add_page()
pdf.write_html(
    """<table border="1"><thead><tr>
        <th width="33%">Name</th>
        <th width="66%">Formula</th>
    </tr></thead><tbody><tr>
        <td>Lucas-C</td><td>E = MC<sup>2</sup></td>
    </tr</tbody></table>""")
pdf.output("issue_860.pdf")

I agree that it would be nice if fpdf2 supported this usage! 😊 I would welcome a PR that implements this in HTML2FPDF: https://github.com/PyFPDF/fpdf2/blob/master/fpdf/html.py#L195


I also fully agree with you @gmischler on this:

And while we're on the topic: Adding a conforming commonmark implementation (possibly in parallel) should probably be the long term goal.

Ideally, we could support combining fpdf2 with https://github.com/executablebooks/markdown-it-py But then, would the translation chain be Markdown -> HTML, and then use FPDF.write_html()? This is not ideal, as our HTML2PDF converter is very limited: https://pyfpdf.github.io/fpdf2/HTML.html

So I'm not really sure of the path forward regarding Markdown support...

Tolker-KU commented 1 year ago

Ideally, we could support combining fpdf2 with https://github.com/executablebooks/markdown-it-py But then, would the translation chain be Markdown -> HTML, and then use FPDF.write_html()? This is not ideal, as our HTML2PDF converter is very limited: https://pyfpdf.github.io/fpdf2/HTML.html

So I'm not really sure of the path forward regarding Markdown support...

I think markdown-it-py parses markup to tokens before rendering to HTML. Maybe fpdf2 can render the tokens directly to PDF instead of using HTML as an intermediate step.

https://markdown-it-py.readthedocs.io/en/latest/using.html#the-token-stream

Lucas-C commented 1 year ago

I think markdown-it-py parses markup to tokens before rendering to HTML. Maybe fpdf2 can render the tokens directly to PDF instead of using HTML as an intermediate step.

Sure, we could do that! But then we will basically have to maintain a new "Markdown2PDF" class 😅

I'm not opposed to this, if someone is willing to contribute / initiate such converter to this project, and if it is mostlty compatible / does not break too many existing behaviours of fpdf2.

Tolker-KU commented 1 year ago

I'm looking for a feature to render subscripts and superscript within cells. As far as I can figure out this is not quite achievable with .write_html. Or am I wrong here?

No, you are right. fpdf2 currently does not support <sup> & <sup> tags inside <table>:

from fpdf import FPDF

pdf = FPDF()
pdf.set_font("Helvetica")
pdf.add_page()
pdf.write_html(
    """<table border="1"><thead><tr>
        <th width="33%">Name</th>
        <th width="66%">Formula</th>
    </tr></thead><tbody><tr>
        <td>Lucas-C</td><td>E = MC<sup>2</sup></td>
    </tr</tbody></table>""")
pdf.output("issue_860.pdf")

I've been looking into how to solving this. It seems that cells in tables rendered from HTML call FPDF.multi_cell(). https://github.com/PyFPDF/fpdf2/blob/54d2eb0266bd3b1ccbf4dc384ea46c9b0d6b718d/fpdf/table.py#L278-L293 As far as I can see FPDF.multi_cell() has no ability to render text with mixed vpos. One idea is to expose something like _render_styled_text_line() on FPDF that takes a TextLine which support text fragments with different styling. Could that be a way forward?

gmischler commented 1 year ago

As far as I can see FPDF.multi_cell() has no ability to render text with mixed vpos. One idea is to expose something like _render_styled_text_line() on FPDF that takes a TextLine which support text fragments with different styling. Could that be a way forward?

As you have correctly recognized, this is a fundamental limitation of multi_cell(). For formatting changes within a paragraph, there is the alternative write(), but that currently has the disadvantage that it can only create left-aligned text.

Fixing this cleanly requires some architectural changes to fpdf2. I have outlined a possible solution in #339, and have been working on-and-off on an actual implementation. I hope I'll find time again soon so I can actually show some more progress here.

Theoretically, write_html() could also get more low-level access to the fpdf.py internals as you suggest, but I think a more general high-level approach to text formatting is better in the long run. Several similar issues have been raised over the last year, which all correctly pointed at the same set of current limitations. I'm sorry to say that the necessary groundwork for a true and general solution will take a bit more time.

Lucas-C commented 1 year ago

By the way, I think that this other, older issue is related: https://github.com/PyFPDF/fpdf2/issues/151

Lucas-C commented 4 months ago

Regarding the initial question about Markdown, combining fpdf2 with mistletoeo can be a good alternative approach: https://py-pdf.github.io/fpdf2/CombineWithMistletoeoToUseMarkdown.html

I renamed this issue into: write_html: support <sup> & <sup> tags inside <table> in order to clarify what the current feature request is 🙂 For clarity, just repeating the minimal code snippet that we are looking to support:

from fpdf import FPDF

pdf = FPDF()
pdf.set_font("Helvetica")
pdf.add_page()
pdf.write_html(
    """<table border="1"><thead><tr>
        <th width="33%">Name</th>
        <th width="66%">Formula</th>
    </tr></thead><tbody><tr>
        <td>Lucas-C</td><td>E = MC<sup>2</sup></td>
    </tr</tbody></table>""")
pdf.output("issue_860.pdf")

Since PR #897 by @gmischler, HTML2FPDF is better architectured and now uses .text_columns() & paragraphs to render text. This should now ease the implementation of this feature.