py-pdf / fpdf2

Simple PDF generation for Python
https://py-pdf.github.io/fpdf2/
GNU Lesser General Public License v3.0
1.12k stars 253 forks source link

Enhancement suggestion - direct matplotlib figure save from within fpdf2 #789

Open LandyQuack opened 1 year ago

LandyQuack commented 1 year ago

Firstly - excellent library / thank you for all your hard work. Used it vs alternatives because of vector graphics support but was really surprised by (slow) speed on some matplotlib images (savefig to BytesIO as SVG to pdf.image) and wondered if MatPlotLib direct conversion was much faster - it is.

As an example (attached script reproduces), 3 matplotlib plots (1 of blood pressure, 1 the anatomy example and 1 an xkcd example) having timings like:

Generate figures: 102.45670797303319 ms <-- all 3

----------------------------------------------------------------------------------------

MatPlotLib PdfPages - fig 0: 33.42100000008941 ms <-- Blood Pressure MatPlotLib PdfPages - fig 1: 89.3275830312632 ms <-- Anatomy MatPlotLib PdfPages - fig 2: 38.10716699808836 ms <-- xkcd MatPlotLib PdfPages - overall: 160.90095799881965 ms

----------------------------------------------------------------------------------------

Fpdf - fig 0: 276.9010409829207 ms <-- Blood Pressure Fpdf - fig 1: 4885.383540997282 ms <-- Anatomy Fpdf - fig 2: 646.598165971227 ms <-- xkcd Fpdf - overall: 5808.975291030947 ms

----------------------------------------------------------------------------------------

So nearly 6,000 ms for Fpdf2 for 3 plots versus 160 ms for MatPlotLib to produce essentially the same PDF. Size wise they're within 1k of each other.

Not sure how much of this is figure -> svg -> pdf vs figure -> pdf and how much is C vs Python but I started looking because a document with ~ 20 plots in Fpdf2 was taking a surprisingly long time to generate.

My question is about whether or not a feature might be considered to implement fpdf.savefig() or similar - perhaps by nabbing images direct from figure -> pdf -> fpdf2?

test1.txt

Lucas-C commented 1 year ago

Welcome @LandyQuack πŸ™‚

Firstly - excellent library / thank you for all your hard work.

Thank you!

My question is about whether or not a feature might be considered to implement fpdf.savefig() or similar - perhaps by nabbing images direct from figure -> pdf -> fpdf2?

I have adapted your script using the FigureCanvas approach to embed figures, as described in our documentation: https://pyfpdf.github.io/fpdf2/Maths.html#using-matplotlib

issue_789.py.txt

The results are a lot better, performance-wise:

$ ./issue_789.py
Generate / append figures: 116.06030003167689 ms
#----------------------------------------------------------------------------------------
PdfPages - fig 0: 92.47630002209917 ms
PdfPages - fig 1: 175.60830002184957 ms
PdfPages - fig 2: 35.08909995434806 ms
PdfPages - overall: 303.32190002081916 ms
#----------------------------------------------------------------------------------------
Fpdf - fig 0: 101.3886000146158 ms
Fpdf - fig 1: 199.06949996948242 ms
Fpdf - fig 2: 48.978100006934255 ms
Fpdf - overall: 349.7093000332825 ms
#----------------------------------------------------------------------------------------

To me, there does not seem to be a need for much enhancement.

What do you think?

LandyQuack commented 1 year ago

I may be misreading your link but doesn't that create an image rather than anything vector based?

Lucas-C commented 1 year ago

I may be misreading your link but doesn't that create an image rather than anything vector based?

Ah yes, sorry, I did not realize that you wanted vector graphics and not raster graphics πŸ˜…

LandyQuack commented 1 year ago

I spent a little bit of time this evening trying to see if I could snaffle the relevant bits from matplotlib/PDFPages/savefig and... I think what it's doing is translating the figure into PDF paths and then wrapping that up with the rest of the PDF essentials like fonts and metadata.

I guess what I was wondering is if there might be a way to use some of that existing code to turn a figure into whatever it looks like in a PDF and then put that in the right place in the pdf using fpdf2?

Lucas-C commented 1 year ago

I had a look myself at https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1939

I think we could subclass matplotlib.backends.backend_pdf.RendererPdf in order to render figures directly to a fpdf2.FPDF instance.

I won't have the time to tackle this interesting challenge myself, but this sure looks like a fun exercise, and I would welcome a Pull Request that provides that!

LandyQuack commented 1 year ago

Played around with that and got a little lost in the function calls but have something very simple (attached) which spits out entries like:

b'/DeviceRGB CS' b'/DeviceRGB cs' b'1 j' b'1 g 0 j 0 w 1 G 1 g' b'0 0 m\n460.8 0 l\n460.8 345.6 l\n0 345.6 l\nh\n' b'f' b'/A1 gs 0.9176470588 0.9176470588 0.9490196078 rg 0 G 0.9176470588\n0.9176470588 0.9490196078 rg' b'57.6 38.016 m\n414.72 38.016 l\n414.72 304.128 l\n57.6 304.128 l\nh\n' b'f' b'q 57.6 38.016 357.12 266.112 re W n /A2 gs 1 J 1 j 0.8 w 1 G /DeviceRGB cs' b'89.594157 38.016 m\n89.594157 304.128 l\n' b'S' b'Q q /A2 gs 0.15 g 1 j 1 w 0.15 G 0.15 g' b'q' b'1 0 -0 1 78.469156895 23.85975 cm' b'BT' b'/F1 10 Tf' b'0 0 Td' b'[ (2006) ] TJ' b'ET'

which, looking at https://github.com/gendx/pdf-cheat-sheets/blob/master/pdf-graphics.clean.pdf, seem to be PDF drawing commands and there are recognisable year names and strings like

b'[ (Blood Pressure) ] TJ'

which are clearly from my test image.

The code is trivial - basically two subclasses overriding init and 1 print statement in the output function of PdfFile.

Now... since PDF innards are a black art... does any of this look like it might move things towards a goal of taking a MatPlotLib figure and (quickly) turning it into FPDF2 usable content without the (relatively) slow SVG intermediate parse?

If it does, can anyone point me in the right direction for finding the start and end of the converted figure? If I know those, I can work to finding what's generating everything in between!

mpl1.txt

Lucas-C commented 1 year ago

Hi @LandyQuack!

This looks promising πŸ‘

I'll try to give a closer look at your code whenever I have some free time this week.

Lucas-C commented 1 year ago

A quick analysis of the stuff in matplotlib.backends.backend_pdf:

Hence, the crux of the processing lies in those two last classes.

There is how you can use subclasses of them:

import matplotlib as mpl
from matplotlib.backends.backend_pdf import PdfFile, RendererPdf

class CustomPdfFile(PdfFile):
    pass

class CustomRendererPdf(RendererPdf):
    pass

mpl.rcParams['pdf.compression'] = False
mpl.rcParams['pdf.use14corefonts'] = True

# ... obtain a fig and then:
data = BytesIO()
width, height = fig.get_size_inches()
pdf_file = CustomPdfFile(data)
pdf_file.newPage(width, height)
renderer = CustomRendererPdf(pdf_file, fig.dpi, height, width)
print("PDF file initial content:")
for line in data.getvalue().split(b"\n"):
    print(line)
fig.draw(renderer)
pdf_file.finalize()
with open("issue-789-PdfFile.pdf", "wb") as out_file:
    out_file.write(data.getvalue())

This should help you to figure when the figure rendering starts! 😊

LandyQuack commented 1 year ago

Lucas - that's been super helpful especially the compression and the font bits. I didn't quite use your code but used something like

class Pdf_Object (PdfFile):
    """
        The theory goes... everything about how the PDF is constructed happens in PdfFile so... if we can decipher it...
        then we can capture what we'll need for FPDF2 e.g. fonts and drawing instructions etc... and if we can do that
        then we should be able to do FPDF2.fonts.append (blah) and FPDF2.add_mpl_figure (PDF_Obj.blah) or whatever the
        function calls might be. Haven't looked yet but presumably the SVG parser must wrap up similar drawing primitives
        so that might be the way to test a proof of concept.

        Subclass https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L660 so we can
        log member function calls within pdf output. There are a couple of functions we can't log with PdfFile.output
        because it triggers a recursion level limit fault. We also skip what look like non output utility functions.
    """
    def __init__ (self, filename, metadata=None):
        super().__init__(filename, metadata=None)

    def newPage(self, width, height):
        """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L769 """
        self.output ('PdfFile.newPage')
        super().newPage(width, height)

    def newTextnote(self, text, positionRect=[-100, -100, 0, 0]):
        """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L798 """
        self.output ('PdfFile.newTextnote')
        super().newTextnote(text, positionRect)

and that's giving me output like

python3 mpl1.py
b'%PDF-1.4'
b'%\xac\xdc \xab\xba'
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.writeObject β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
b'1 0 obj'
b'<< /Type /Catalog /Pages 2 0 R >>'
b'endobj'
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.writeObject β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
b'8 0 obj'
b'<< /Font 3 0 R /XObject 7 0 R /ExtGState 4 0 R /Pattern 5 0 R'
b'/Shading 6 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >>'
b'endobj'
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.newPage β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.endStream β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.writeObject β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
b'11 0 obj'
b'<< /Type /Page /Parent 2 0 R /Resources 8 0 R'
b'/MediaBox [ 0 0 460.8 345.6 ] /Contents 9 0 R /Annots 10 0 R >>'
b'endobj'
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.beginStream β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
b'9 0 obj'
b'<< /Length 12 0 R >>'
b'stream'
b'/DeviceRGB CS'
b'/DeviceRGB cs'
b'1 j'
b'1 g 0 j 0 w 1 G 1 g'
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.writePath β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
b'0 0 m'
b'460.8 0 l'
b'460.8 345.6 l'
b'0 345.6 l'
b'h'
b''
b'f'
b'/A1 gs 0.9176470588 0.9176470588 0.9490196078 rg 0 G 0.9176470588'
b'0.9176470588 0.9490196078 rg'
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PdfFile.writePath β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
b'57.6 38.016 m'
b'414.72 38.016 l'
b'414.72 304.128 l'
b'57.6 304.128 l'
b'h'
b''
b'f'
b'q 57.6 38.016 357.12 266.112 re W n /A2 gs 1 J 1 j 0.8 w 1 G /DeviceRGB cs'

and I can start to see where the figure is represented in the PDF.

I think I need to look at the SVG code next because I presume that the vector lines etc in the SVG become pdf drawing commands in the same way so... if I can see what that code does to say "insert these drawing commands here and magically FPDF2 shall find and incorporate them" (paths?) then I should be a bit further to something that says

pdf = FPDF() pdf.add_mpl_figure (fig, w,h)

so it behaves like an svg or a png or whatever and can be put in table cells etc.

i'm thinking that FPDF will need / want some sort of PDF object (basically PdfFile without the file generation) that can be queried to say - give me your images and your font usage and your paths a bit like

for paths in pdf_obj.paths(): add in some clever fashion.

Current code attached.

Iain mpl1.txt

LandyQuack commented 1 year ago

Got this working to proof of concept level at least. After playing around with trying to reconstruct the pdf from the innards of the renderer (and at least getting something on screen), decided that the matplotlib pdf backend is perfectly capable of generating pdf content so...

subclassed PdfFile, captured output to a BytesIO and nabbed everything between stream and endstream and put it into FPDF using _out().

Fonts were a bit harder as the reference in the stream has to match what FPDF is adding so replaced fontname.

It works in as far as I get my test MatPlotLib figure in a FPDF page at standard zoom and have embedded a vector graphic.

Needs work on scaling and positioning (to use in something like a cell) and Truetype fonts but, as a proof of concept, I'm happy with it so far.

Iain

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
#from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
from matplotlib.patches import Circle
from matplotlib.patheffects import withStroke
from matplotlib.ticker import AutoMinorLocator, MultipleLocator
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
import seaborn as sns
from io import BytesIO
from fpdf import FPDF, drawing
import logging

# For PDF export
from matplotlib.backends.backend_pdf import PdfPages, PdfFile, pdfRepr, _fill,FigureCanvasPdf, RendererPdf, Op
from matplotlib import cbook, _path
from matplotlib._pylab_helpers import Gcf
from matplotlib.backends.backend_mixed import MixedModeRenderer
from matplotlib.font_manager import fontManager as _fontManager, FontProperties
from pathlib import Path

#----------------------------------------------------------------------------------------
def W (txt):
    """ Wrap a string in a box using ascii line drawing characters - easier to see """
    s = '\u2500' * (len (txt) + 2)
    print (f"\u250c{s}\u2510\n\u2502 {txt} \u2502\n\u2514{s}\u2518")
#----------------------------------------------------------------------------------------

#----------------------------------------------------------------------------------------
def Draw_BP_Graph ():
    """ Draw simple floating bar graph of Blood Pressure """

    BP_data = [
    ('21/3/2005',142, 86),('13/2/2010', 131, 87),('2/6/2011', 141, 83),('27/2/2013', 180, 93),
    ('1/5/2017', 137, 65),('12/11/2018',151,68),('14/5/2022',155, 86)
    ]

    # Create the dataframe
    BP = pd.DataFrame (BP_data, columns=['When', 'Systolic', 'Diastolic'])

    # Convert dates in the When column - lose the time component
    BP['When'] = pd.to_datetime(BP['When'], dayfirst=True).dt.date

    # For a floating bar graph we need a height (systolic - diastolic) as the bar starts at diastolic and has a height
    BP['Height'] = BP['Systolic'] - BP['Diastolic']

    # Graph Blood Pressure - label things
    plt.title('Blood Pressure', fontsize=10)
    # plt.xlabel('Year', fontsize=14)
    # plt.ylabel('mm Hg', fontsize=14)

    # Plot bars from diastolic up to systolic in blue
    plt.bar (BP['When'], BP['Height'], bottom=BP['Diastolic'], width=40, color='blue')
    plt.grid(True)

    # Add lines at 140 & 90 in red - styles as per https://matplotlib.org/3.5.0/api/_as_gen/matplotlib.pyplot.axhline.html (: is subtle)
    ax = plt.gca()
    for y in (140,90): ax.axhline(y, color='red', linestyle=':')

    # Shift the y-axis down by 15 (looks prettier) and up by the same
    bottom, top = plt.ylim()  # return the current ylim
    plt.ylim((bottom-15, top+15))   # set the ylim to bottom, top

    # Return the figure
    return plt.gcf()

#----------------------------------------------------------------------------------------
class Custom_FPDF(FPDF):

    def MPL_Figure (self, fig):
        """ Try and save an MatPlotLib figure to a FPDF instance """

        fig.dpi = 72  # there are 72 pdf points to an inch
        width, height = fig.get_size_inches()

        # pdf_file is our in memory PDF generated'ish by MatPlotLib
        data = BytesIO()
        pdf_file = Pdf_Object(data,parent=self)

        # Have to figure out how to alter both position and size
        pdf_file.newPage(width,height)
        renderer = RendererPdf(pdf_file, fig.dpi, height, width)

        #renderer = MixedModeRenderer(fig, width, height, fig.dpi,renderer,bbox_inches_restore=bbox_inches_restore)
        renderer = MixedModeRenderer(fig, width, height, fig.dpi, renderer)

        fig.draw(renderer)
        renderer.finalize()

        pdf_file.finalize()

        # And the same for the XRef table - we may want to grab things from here
        #for i,x in enumerate(pdf_file.XRef()): print (f'Xref[{i}]: {x}')

        # Get the in memory PDF
        dv = data.getvalue()

        # Debug
        #for line in dv.split(b"\n"): print (line)

        # Look for output between b'stream' and b'endstream'
        idx1 = dv.find(b'stream')
        idx2 = dv.find(b'endstream')

        # and write that wholesale and unmodified into a FPDF page
        self._out (dv[idx1+7:idx2])

#----------------------------------------------------------------------------------------
# class RendererPdf2(RendererPdf):
#   _afm_font_dir = cbook._get_data_path("fonts/pdfcorefonts")
#   _use_afm_rc_name = "pdf.use14corefonts"

#   def __init__(self, file, image_dpi, height, width):
#       super().__init__(file, image_dpi, height, width)
#       self.file = file
#       self.gc = self.new_gc()
#       self.image_dpi = image_dpi

#   def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
#       print (f'draw_text: {s} @ {x},{y} - {prop}')
#       super().draw_text(gc, x, y, s, prop, angle, ismath, mtext)

#----------------------------------------------------------------------------------------
class Pdf_Object (PdfFile):
    """
        For now, we generate a PDF in memory and re-use anything between stream and endstream labels
        and can see a MatPlotLib figure rendered in an FPDF page. We need to sort font references
        next and if that works we can remove PDF building blocks we will never use. 
    """

    def __init__ (self, filename, metadata=None, parent=None ):
        super().__init__(filename, metadata=None)
        self.parent = parent

    def XRef(self):
        return self.xrefTable

    def fontName(self, fontprop):
        """
            Font names used in the rendered MatPlotLib Figure are references to a font table (key in a dictionary)
            e.g. sans\-serif:style=normal:variant=normal:weight=normal:stretch=normal:size=10.0 is "/F1"
            ----
            The generated figure->pdf has to reference the font name used internal to FPDF rather than the one
            from the MatPlotLib pdf rendering backend
        """

        print (f'FontProp: {fontprop}')

        # TTF? Needs work
        if isinstance(fontprop, str):
            self.parent.add_font(fname=fontprop)
            for k,v in self.parent.fonts.items():
                if str(v['ttffile']) == fontprop:
                    self.parent.set_font ('arial', size=10.0)
                    return (v['fontkey'])
        # Built in
        elif isinstance(fontprop, FontProperties):
            self.parent.set_font(fontprop.get_name(), size=fontprop.get_size())
            return self.parent.current_font['i']

#   def newPage(self, width, height):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L769 """
#       self.output ('PdfFile.newPage')
#       super().newPage(width, height)

#   def newTextnote(self, text, positionRect=[-100, -100, 0, 0]):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L798 """
#       self.output ('PdfFile.newTextnote')
#       super().newTextnote(text, positionRect)

#   def finalize(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L823 """
#       self.output ('PdfFile.finalize')
#       super().finalize ()

#   def close(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L856 """
#       self.output ('PdfFile.close')
#       super().close()

#   def beginStream(self, id, len, extra=None, png=None):
        """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L877 """
#       self.output ('PdfFile.beginStream')
#       super().beginStream (id, len, extra=None, png=None)

#   def endStream(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L881 """
#       self.output ('PdfFile.endStream')
#       super().endStream()

#   def fontName(self, fontprop):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L895 """
#       self.output ('PdfFile.fontName')
#       super().fontName (fontprop)

#   def dviFontName(self, dvifont):
        """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L926 """
#       self.output ('PdfFile.dviFontName')
#       super().dviFontName (dvifont)

#   def writeFonts(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L956 """
#       self.output ('PdfFile.writeFonts')
#       super().writeFonts ()

#   def _write_afm_font(self, filename):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L977 """
#       self.output ('PdfFile._write_afm_font')
#       super()._write_afm_font (filename)

#   def _embedTeXFont(self, fontinfo):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L989 """
#       self.output ('PdfFile._embedTeXFont')
#       super()._embedTeXFont (fontinfo)

#   def createType1Descriptor(self, t1font, fontfile):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1047 """
#       self.output ('PdfFile.createType1Descriptor')
#       super().createType1Descriptor (fontinfo)

#   def embedTTF(self, filename, characters):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1138 """
#       self.output ('PdfFile.embedTTF')
#       super().embedTTF (filename, characters)
#
#   def writeExtGSTates(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1520 """
#       self.output ('PdfFile.writeExtGSTates')
#       super().writeExtGSTates ()

#   def _write_soft_mask_groups(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1529 """
#       self.output ('PdfFile._write_soft_mask_groups')
#       super()._write_soft_mask_groups ()

#   def writeHatches(self):
##      """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1553 """
#       self.output ('PdfFile.writeHatches')
#       super().writeHatches ()

#   def writeGouraudTriangles(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1614 """
#       self.output ('PdfFile.writeGouraudTriangles')
#       super().writeGouraudTriangles ()
#       
#   def _writePng(self, img):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1693 """
#       self.output ('PdfFile._writePng')
#       super()._writePng (img)

#   def _writeImg(self, data, id, smask=None):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1722 """
#       self.output ('PdfFile._writeImg')
#       super()._writePng (data, id, smask)

#   def writeImages(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1781 """
#       self.output ('PdfFile.writeImages')
#       super().writeImages ()

#   def writeMarkers(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1820 """
#       self.output ('PdfFile.writeMarkers')
#       super().writeMarkers ()

#   def writePathCollectionTemplates(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1850 """
#       self.output ('PdfFile.writePathCollectionTemplates')
#       super().writePathCollectionTemplates ()

#   def writePath(self, path, transform, clip=False, sketch=None):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1880 """
#       self.output ('PdfFile.writePath')
#       if clip:
#           #print ('Clip')
#           clip = (0.0, 0.0, self.width * 72, self.height * 72)
#           simplify = path.should_simplify
#       else:
#           #print ('No Clip')
#           clip = None
#           simplify = False
#
#       cmds = self.pathOperations(path, transform, clip, simplify=simplify, sketch=sketch)
#       self.output(*cmds)
#
#       # Return the pdf draw command
#       return (cmds)
#       super().writePath (path, transform, clip, sketch)

#   def writeObject(self, object, contents):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1905 """
#       self.output ('PdfFile.writeObject')
#       super().writeObject (object, contents)

#   def writeXref(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1909 """
#       self.output ('PdfFile.writeXref')
#       super().writeXref ()
#       
#   def writeInfoDict(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1922 """
#       self.output ('PdfFile.writeInfoDict')
#       super().writeInfoDict ()

#   def writeTrailer(self):
#       """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1928 """
#       self.output ('PdfFile.writeTrailer')
#       super().writeTrailer ()

#   def savefig(self, figure=None, **kwargs):
#       """ Based on https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#LL2724C1-L2745C57 """
#       if not isinstance(figure, Figure):
#           if figure is None: manager = Gcf.get_active()
#           else: manager = Gcf.get_fig_manager(figure)
#       
#           if manager is None: raise ValueError(f"No figure {figure}")
#       
#           figure = manager.canvas.figure
#
#       # Force use of pdf backend, as PdfPages is tightly coupled with it.
#       with cbook._setattr_cm(figure, canvas=FigureCanvasPdf2(figure)): figure.savefig(self, format="pdf", **kwargs)
#       
#   def finalize(self, pdf):
#       self.output ('PdfFile.finalize')
#       super().finalize()

#----------------------------------------------------------------------------------------
def main():

    # Set Seaborn plot style
    sns.set_style("dark")

    # Hide a bunch of missing font messages (xkcd graph)
    logging.getLogger('matplotlib.font_manager').setLevel(logging.ERROR)

    # Switch off compression and simplify fonts
    mpl.rcParams['pdf.compression'] = False
    mpl.rcParams['pdf.use14corefonts'] = True

    # Simple 1 page PDF
    pdf = Custom_FPDF()
    pdf.add_page()
    pdf.set_draw_color (0,0,0)
    #pdf.set_line_width(20)

    # Crudely hacked out of  MatPlotLib multipage PDF
    fig = Draw_BP_Graph()

    # Output thefigure using MatPlotLib
    with PdfPages('MatPlotLib_Output.pdf') as mpdf: mpdf.savefig ()

    # Protype FPDF extension
    pdf.MPL_Figure (fig)

    # Output what we've got into FPDF2 so far
    pdf.output ('FPDF_Output.pdf')

#----------------------------------------------------------------------------------------
# Main runtime entry point
if __name__ == "__main__": main()
Lucas-C commented 1 year ago

Hi @LandyQuack!

Sorry for the delay, I have been a bit busy over the last 2 weeks.

Currently, when trying to run your latest script, I get this error:

  File "./issue_789c.py", line 163, in fontName
    self.parent.set_font(fontprop.get_name(), size=fontprop.get_size())
...
fpdf.errors.FPDFException: Undefined font: dejavu sans - Use built-in fonts or FPDF.add_font() beforehand

But I was able to solve this error by simply adding pdf.add_font("dejavu sans", fname="test/fonts/DejaVuSans.ttf") in main()

The resulting PDF is promising, but I see zero visible text. There might still be something wrong regarding font management.

Apart from that, I looked at the Custom_FPDF.MPL_Figure() method & Pdf_Object class you wrote. Dumping the whole content stream to FPDF._out() is very "raw"... Providing another implementations of the matplotlib.backends.backend_pdf.GraphicsContextPdf.commands could be a cleaner approach... There are only 9 commands there, that could all be implemented with calls to FPDF methods. Have you considered this option?

Also, what is your end goal? Would you like to contribute code to fpdf2? If so, I will be relatively strict on the code quality if you want to add public methods to the fpdf package, but this can be a very good learning exercice 😊 On the other hand, an autonomous script could be provided as part of our docs/ (maybe in https://pyfpdf.github.io/fpdf2/Maths.html?), and I would be less strict on the code quality then, as long as it's relatively short. And finally, you of course choose not to share your code in fpdf2, which is totally fine πŸ˜…. In that case I'm still available to answer your questions, and just hope the solution you found solved your initial need!

LandyQuack commented 1 year ago

Hi Lucas - no worries at all at the delay.

Agree re "raw"ness of that approach - was more to get a handle on what was happening where in the code. Have done much as you suggest but subclassed PdfFile because it seemed easier to start with something which worked and then add diagnostics as and where I needed.

So... where is the code up to?

    # Our FPDF version
    fpdf =  MPL_FPDF()
    print ('FPDF')

    fpdf.add_font(fname='/Library/Fonts/Microsoft/Times New Roman.ttf')
    fpdf.add_font(fname='/Library/Fonts/Microsoft/Arial.ttf')
    fpdf.add_font(family='dejavu sans mono', fname='/Users/iain/Library/Fonts/DejaVuSansMono.ttf')
    fpdf.add_font(fname='/System/Library/Fonts/Supplemental/Courier New.ttf')
    #fpdf.set_font("Arial", size=10)

    for fig in figs:
        f = fig()
        fpdf.add_page()
        fpdf.savefig (figure=f, bbox_inches='tight')
        plt.close(f)

    # Output what we've got into FPDF2 so far
    fpdf.output ('Output_FPDF.pdf')

ends up in

class MPL_FPDF(FPDF):
    #----------------------------------------------------------------------------------------
    def savefig(self, figure=None, **kwargs):
        if not isinstance(figure, Figure):
            if figure is None: manager = Gcf.get_active()
            else: manager = Gcf.get_fig_manager(figure)
            if manager is None: raise ValueError(f"No figure {figure}")
            figure = manager.canvas.figure

        # Fpdf uses top left origin, matplotlib bottom left so... fix Y axis
        ax = figure.gca()
        ax.set_ylim(ax.get_ylim()[::-1])
        ax.xaxis.tick_top()  

        # Fix title position
        mpl.rcParams['axes.titley'] = -0.1

        # Force use of pdf backend, as PdfPages is tightly coupled with it.
        with cbook._setattr_cm(figure, canvas=FigureCanvasPdf2(figure, parent=self)):
            figure.savefig(self, format="pdf", **kwargs)

and taking draw_text as an example

    def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):

        #print (f'draw_text: {s} @ {x},{y} - {prop} @ {angle} degrees')

        if isinstance(prop, str):
            self.parent.add_font(fname=prop)
            for k,v in self.parent.fonts.items():
                if str(v['ttffile']) == prop:
                    print (f'Font: prop')
                    self.parent.set_font('Arial', size=10.0)
        # Built in
        elif isinstance(prop, FontProperties):
            self.parent.set_font(prop.get_name(), size=prop.get_size())

        x,y = self._trans.transform ((x,y))
        self.parent.text(x,y,s)

with self.parent.text being fpdf.text

So... I can draw a number of basic / standard matplotlib figures directly into fpdf :-) Fonts work but need to sort rotated text yet.

I need to (a) make it not subclass the existing pdf renderer from MatPlotLib because I don't think it needs to (b) figure out how to fit the resulting output into an FPDF container (say a table cell) - more below (c) figure out why the anatomy path with the markers on doesn't draw in MPL but does in my code and (d) do proper circles (think I just need to tell the renderer that we speak bezier.

This is a screenshot of what my output (non "raw" drawing direct into fpdf using the existing drawing commands looks like. I'm pleased with progress so far.

Screenshot 2023-06-05 at 21 22 43

and for simpler plots it works out of the box and looks like MPL.

As above, need to figure out how to get what I'm generating into the right place / size on the screen. I'm currently doing this:

self._scale = scale        # scale = self._parent.epw / (width*self.figure.dpi)
self._origin = (2,2)

# Setup our transform
self._trans = Affine2D().scale(self._scale).translate(*self._origin)

so can size and position where needed but need to see what fpdf actually needs me to do.

End goal... hmm, I'm a medic rather than a coder so for what I need/want it's tediously simple vector graphs in amongst text in a PDF (kinda what being lazy I'd have done with Word). Raster graphics would probably have been fine but the purist in me much prefer nice crisp vectors. I just thought I'd see what I could do in code because I enjoy it. Learned about affine transformations along the way.

If I can make something others can get use out of - even better. I get from the community so if I can give back, seems fair.

There will be 20 more optimal ways of doing some of what I've done so think I'll offer the final working version for someone who knows what they're doing to look at / use in whatever way they see fit :-) This is hobby stuff for me and the rest of life keeps me busy enough to not want to maintain code / debug esoteric corner cases.

Current code base attached.

Iain

mpl5.zip

LandyQuack commented 1 year ago

Interesting discovery last night - mpl.use

    pdf =  FPDF()
    pdf.set_font('Times')

    # Use our custom renderer
    mpl.use("module://fpdf_renderer")

    pdf.add_font(family='dejavu sans mono', fname='/Users/iain/Library/Fonts/DejaVuSansMono.ttf')

    for fig in figs:
        f = fig()
        origin = (20,100)
        scale = 0.3

        pdf.add_page()
        f.savefig (fname=None, fpdf=pdf, origin=origin, scale=scale, bbox_inches='tight')
        plt.close(f)

    # Output what we've got into FPDF 
    pdf.output ('Output_FPDF.pdf')

where fpdf_renderer.py looks like the code below. Needs quite a bit of work yet but got text and a grid in a pdf.

"""
    Based on https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/backends/backend_template.py

    Just need to tell MatPlotLib to use this renderer and then do fig.savefig.
"""

from matplotlib import _api
from matplotlib._pylab_helpers import Gcf
from matplotlib.backend_bases import (FigureCanvasBase, FigureManagerBase, GraphicsContextBase, RendererBase)
from matplotlib.figure import Figure
from matplotlib.transforms import Affine2D
import matplotlib as mpl

class RendererTemplate(RendererBase):
    """ Removed draw_markers, draw_path_collection and draw_quad_mesh - all optional, we can add later """

    def __init__(self, dpi, fpdf, transform):
        super().__init__()
        self.dpi = dpi
        print (f'FPDF: {fpdf}')
        self._fpdf = fpdf
        self._trans = transform

        # some safe defaults
        if fpdf:
            fpdf.set_draw_color(0,0,0)
            fpdf.set_fill_color(255,0,0)

            #       
    def draw_path(self, gc, path, transform, rgbFace=None):

        #self.check_gc(gc, rgbFace)
        gc.paint()

        # Unzip the path segments into 2 arrays - commands and vertices, the transform sorts scaling and positioning
        tran = transform + self._trans
        c,v = zip(*[(c,v.tolist()) for v,c in path.iter_segments(transform=tran)])

        p = self._fpdf

        with p.local_context():

            if rgbFace: p.set_draw_color (rgbFace[:3])

            #p.set_line_width (gc._linewidth*self._scale)

            match c:
                # Polygon - starts with moveto, end with closepoly - DF means draw and fill
                case [path.MOVETO, *_, path.CLOSEPOLY]:
                    p.polygon(v[:-1],style="DF")

                # Simple line
                case [path.MOVETO, path.LINETO]:
                    p.polyline(v)

                # Polyline - move then a set of lines
                case [path.MOVETO, *mid, path.LINETO] if all(e == path.LINETO for e in mid):
                    p.polyline (v)

                case _:
                    print (f'draw_path: Unmatched {c}')

    def draw_image(self, gc, x, y, im):
        pass

    def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
        print (f'[{x},{y}] {s}')
        x,y = self._trans.transform ((x,y))
        self._fpdf.text(x,y,s)

    def flipy(self):
        return True

    def get_canvas_width_height(self):
        return 100, 100

    def get_text_width_height_descent(self, s, prop, ismath):
        return 1, 1, 1

    def new_gc(self):
        return GraphicsContextTemplate()

    def points_to_pixels(self, points):
        # if backend doesn't have dpi, e.g., postscript or svg
        return points
        # elif backend assumes a value for pixels_per_inch
        # return points/72.0 * self.dpi.get() * pixels_per_inch/72.0
        # else
        # return points/72.0 * self.dpi.get()

class GraphicsContextTemplate(GraphicsContextBase):
    """
    The graphics context provides the color, line styles, etc.  See the cairo
    and postscript backends for examples of mapping the graphics context
    attributes (cap styles, join styles, line widths, colors) to a particular
    backend.  In cairo this is done by wrapping a cairo.Context object and
    forwarding the appropriate calls to it using a dictionary mapping styles
    to gdk constants.  In Postscript, all the work is done by the renderer,
    mapping line styles to postscript calls.

    If it's more appropriate to do the mapping at the renderer level (as in
    the postscript backend), you don't need to override any of the GC methods.
    If it's more appropriate to wrap an instance (as in the cairo backend) and
    do the mapping here, you'll need to override several of the setter
    methods.

    The base GraphicsContext stores colors as an RGB tuple on the unit
    interval, e.g., (0.5, 0.0, 1.0). You may need to map this to colors
    appropriate for your backend.
    """

########################################################################
#
# The following functions and classes are for pyplot and implement
# window/figure managers, etc.
#
########################################################################

class FigureManagerTemplate(FigureManagerBase):
    """
    Helper class for pyplot mode, wraps everything up into a neat bundle.

    For non-interactive backends, the base class is sufficient.  For
    interactive backends, see the documentation of the `.FigureManagerBase`
    class for the list of methods that can/should be overridden.
    """

class FigureCanvasTemplate(FigureCanvasBase):
    """
    The canvas the figure renders into.  Calls the draw and print fig
    methods, creates the renderers, etc.

    Note: GUI templates will want to connect events for button presses,
    mouse movements and key presses to functions that call the base
    class methods button_press_event, button_release_event,
    motion_notify_event, key_press_event, and key_release_event.  See the
    implementations of the interactive backends for examples.

    Attributes
    ----------
    figure : `matplotlib.figure.Figure`
        A high-level Figure instance
    """

    # The instantiated manager class.  For further customization,
    # ``FigureManager.create_with_canvas`` can also be overridden; see the
    # wx-based backends for an example.
    manager_class = FigureManagerTemplate

    def draw(self):
        """
        Draw the figure using the renderer.

        It is important that this method actually walk the artist tree
        even if not output is produced because this will trigger
        deferred work (like computing limits auto-limits and tick
        values) that users may want access to before saving to disk.
        """
        print (f'Draw: {self._fpdf}')

        renderer = RendererTemplate(self.figure.dpi, self._fpdf, self._trans)
        self.figure.draw(renderer)

        # You should provide a print_xxx function for every file format
        # you can write.

        # If the file type is not in the base set of filetypes,
        # you should add it to the class-scope filetypes dictionary as follows:
        filetypes = {**FigureCanvasBase.filetypes, 'fpdf': 'My magic FPDF format'}

    def print_fpdf(self, filename, **kwargs):
        self._fpdf = self._trans = origin = scale = None

        # if not isinstance(self.figure, Figure):
        #   if self.figure is None: manager = Gcf.get_active()
        #   else: manager = Gcf.get_fig_manager(figure)
        #   if manager is None: raise ValueError(f"No figure {self.figure}")
        #   figure = manager.canvas.figure

        # Fpdf uses top left origin, matplotlib bottom left so... fix Y axis
        ax = self.figure.gca()
        ax.set_ylim(ax.get_ylim()[::-1])

        # We pass scale, origin and a handle to the fpdpf instance through here
        for k,v in kwargs.items():
            match (k):
                case 'fpdf': self._fpdf = v
                case 'origin': origin = v
                case 'scale': scale = v
                case _:
                    print (f'Unrecognised keyword {k} -> {v}')

        # Build our transformation do scale and offset for whole figure
        if origin and scale:
            print ('Transform')
            self._trans = Affine2D().scale(scale).translate(*origin)

        self.draw()

    def get_default_filetype(self):
        return 'fpdf'

########################################################################
#
# Now just provide the standard names that backend.__init__ is expecting
#
########################################################################

FigureCanvas = FigureCanvasTemplate
FigureManager = FigureManagerTemplate
Lucas-C commented 1 year ago

Interesting!

You are really performing an in-depth research 😊

LandyQuack commented 1 year ago

Afternoon - 1 thing which is giving me a little difficulty is the text positioning. It looks like fpdf uses x,y as the origin (bottom left I think) for the text whereas matplotlib is using x,y as the centre of the text I think.

    def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
        print (f'RendererTemplate.draw_text - {s} at {x:.0f},{y:.0f} at angle {angle:.1f} with prop {prop} - {mtext}')
        #print (f'RendererTemplate.draw_text - {s} at {x:.0f},{y:.0f} - {mtext}')

        if isinstance(prop, str):
            raise ValueError (f'draw_text.prop is a string ({prop}) - add code to add font')

        # We're expecting a FontProperties instance
        elif isinstance(prop, FontProperties):
            g_fpdf.set_font(prop.get_name(), size=prop.get_size())

        # Transform our data point
        x,y = g_ttrans.transform ((x,y))
        #print (f'[{x:.0f},{y:.0f}] {s}')

        # Get text width to sort positioning - MPL centers on co-ordinate
        tw = g_fpdf.get_string_width(s)

        match angle:
            case 0:
                x -= (tw/2)
                g_fpdf.text(x,y,s)
            case 90 | 90.0:
                print (f'Rotate1 to "{angle}" {type(angle)}') 
                y += (tw/2)
                with g_fpdf.rotation(angle=angle, x=x, y=y):
                    g_fpdf.text(x,y,s)
            case _:
                print (f'Rotate to "{angle}" {type(angle)}') 
                with g_fpdf.rotation(angle=angle, x=x, y=y):
                    g_fpdf.text(x,y,s)

works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?

Lucas-C commented 1 year ago

Hi @LandyQuack!

Are you still playing with this? 😊

works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?

fpdf2 does not have a get_string_height function, but it's usually the opposite: when users call FPDF.cell() / FPDF.multi_cell() / FPDF.write(), they provide a a h= parameter defining the line height.

LandyQuack commented 1 year ago

Hi Lucas

Yes - still playing but some holiday, a long list of house jobs and scanning old photos got in the way.

Generally what I’ve got works well other than Legends which get all scrunched up. Text positioning looks ok I think now but will try your suggestion.

So.. as things stand - I can render a range of graphs convincingly in a variably sized/positioned box on a page. I haven’t looked at Beziers yet as not had need and Legends don’t work properly.

Haven’t yet looked at putting a graph in a table cell (where I want to take it next) but I don’t imagine that will be hard.

Will add to my task list to tidy up existing code and share as others may spot obvious things I’m missing or find graphs which break what I’ve tried so far.

Iain

On 2 Aug 2023, at 11:34, Lucas Cimon @.***> wrote:

Hi @LandyQuack https://github.com/LandyQuack!

Are you still playing with this? 😊

works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?

fpdf2 does not have a get_string_height function, but it's usually the opposite: when users call FPDF.cell() / FPDF.multi_cell() / FPDF.write(), they provide a a h= parameter defining the line height.

β€” Reply to this email directly, view it on GitHub https://github.com/PyFPDF/fpdf2/issues/789#issuecomment-1661966945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5WGOCWRZQYPSQXV2BVYCLXTIULRANCNFSM6AAAAAAYJJZUDU. You are receiving this because you were mentioned.

Lucas-C commented 1 year ago

Thank you for the update Iain

Take your time, and enjoy the summer / your holidays 😊

I'll be happy to give you some feedbacks if you at some point you want to submit a PR