Open LandyQuack opened 1 year ago
Welcome @LandyQuack π
Firstly - excellent library / thank you for all your hard work.
Thank you!
My question is about whether or not a feature might be considered to implement fpdf.savefig() or similar - perhaps by nabbing images direct from figure -> pdf -> fpdf2?
I have adapted your script using the FigureCanvas
approach to embed figures, as described in our documentation:
https://pyfpdf.github.io/fpdf2/Maths.html#using-matplotlib
The results are a lot better, performance-wise:
$ ./issue_789.py
Generate / append figures: 116.06030003167689 ms
#----------------------------------------------------------------------------------------
PdfPages - fig 0: 92.47630002209917 ms
PdfPages - fig 1: 175.60830002184957 ms
PdfPages - fig 2: 35.08909995434806 ms
PdfPages - overall: 303.32190002081916 ms
#----------------------------------------------------------------------------------------
Fpdf - fig 0: 101.3886000146158 ms
Fpdf - fig 1: 199.06949996948242 ms
Fpdf - fig 2: 48.978100006934255 ms
Fpdf - overall: 349.7093000332825 ms
#----------------------------------------------------------------------------------------
To me, there does not seem to be a need for much enhancement.
What do you think?
I may be misreading your link but doesn't that create an image rather than anything vector based?
I may be misreading your link but doesn't that create an image rather than anything vector based?
Ah yes, sorry, I did not realize that you wanted vector graphics and not raster graphics π
I spent a little bit of time this evening trying to see if I could snaffle the relevant bits from matplotlib/PDFPages/savefig and... I think what it's doing is translating the figure into PDF paths and then wrapping that up with the rest of the PDF essentials like fonts and metadata.
I guess what I was wondering is if there might be a way to use some of that existing code to turn a figure into whatever it looks like in a PDF and then put that in the right place in the pdf using fpdf2?
I had a look myself at https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1939
I think we could subclass matplotlib.backends.backend_pdf.RendererPdf
in order to render figures directly to a fpdf2.FPDF
instance.
I won't have the time to tackle this interesting challenge myself, but this sure looks like a fun exercise, and I would welcome a Pull Request that provides that!
Played around with that and got a little lost in the function calls but have something very simple (attached) which spits out entries like:
b'/DeviceRGB CS' b'/DeviceRGB cs' b'1 j' b'1 g 0 j 0 w 1 G 1 g' b'0 0 m\n460.8 0 l\n460.8 345.6 l\n0 345.6 l\nh\n' b'f' b'/A1 gs 0.9176470588 0.9176470588 0.9490196078 rg 0 G 0.9176470588\n0.9176470588 0.9490196078 rg' b'57.6 38.016 m\n414.72 38.016 l\n414.72 304.128 l\n57.6 304.128 l\nh\n' b'f' b'q 57.6 38.016 357.12 266.112 re W n /A2 gs 1 J 1 j 0.8 w 1 G /DeviceRGB cs' b'89.594157 38.016 m\n89.594157 304.128 l\n' b'S' b'Q q /A2 gs 0.15 g 1 j 1 w 0.15 G 0.15 g' b'q' b'1 0 -0 1 78.469156895 23.85975 cm' b'BT' b'/F1 10 Tf' b'0 0 Td' b'[ (2006) ] TJ' b'ET'
which, looking at https://github.com/gendx/pdf-cheat-sheets/blob/master/pdf-graphics.clean.pdf, seem to be PDF drawing commands and there are recognisable year names and strings like
b'[ (Blood Pressure) ] TJ'
which are clearly from my test image.
The code is trivial - basically two subclasses overriding init and 1 print statement in the output function of PdfFile.
Now... since PDF innards are a black art... does any of this look like it might move things towards a goal of taking a MatPlotLib figure and (quickly) turning it into FPDF2 usable content without the (relatively) slow SVG intermediate parse?
If it does, can anyone point me in the right direction for finding the start and end of the converted figure? If I know those, I can work to finding what's generating everything in between!
Hi @LandyQuack!
This looks promising π
I'll try to give a closer look at your code whenever I have some free time this week.
A quick analysis of the stuff in matplotlib.backends.backend_pdf
:
PDFPages
uses FigureCanvasPdf
(in its savefig()
method)FigureCanvasPdf
uses RendererPdf
(in its print_pdf()
method)RendererPdf
uses PdfFile
Hence, the crux of the processing lies in those two last classes.
There is how you can use subclasses of them:
import matplotlib as mpl
from matplotlib.backends.backend_pdf import PdfFile, RendererPdf
class CustomPdfFile(PdfFile):
pass
class CustomRendererPdf(RendererPdf):
pass
mpl.rcParams['pdf.compression'] = False
mpl.rcParams['pdf.use14corefonts'] = True
# ... obtain a fig and then:
data = BytesIO()
width, height = fig.get_size_inches()
pdf_file = CustomPdfFile(data)
pdf_file.newPage(width, height)
renderer = CustomRendererPdf(pdf_file, fig.dpi, height, width)
print("PDF file initial content:")
for line in data.getvalue().split(b"\n"):
print(line)
fig.draw(renderer)
pdf_file.finalize()
with open("issue-789-PdfFile.pdf", "wb") as out_file:
out_file.write(data.getvalue())
This should help you to figure when the figure rendering starts! π
Lucas - that's been super helpful especially the compression and the font bits. I didn't quite use your code but used something like
class Pdf_Object (PdfFile):
"""
The theory goes... everything about how the PDF is constructed happens in PdfFile so... if we can decipher it...
then we can capture what we'll need for FPDF2 e.g. fonts and drawing instructions etc... and if we can do that
then we should be able to do FPDF2.fonts.append (blah) and FPDF2.add_mpl_figure (PDF_Obj.blah) or whatever the
function calls might be. Haven't looked yet but presumably the SVG parser must wrap up similar drawing primitives
so that might be the way to test a proof of concept.
Subclass https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L660 so we can
log member function calls within pdf output. There are a couple of functions we can't log with PdfFile.output
because it triggers a recursion level limit fault. We also skip what look like non output utility functions.
"""
def __init__ (self, filename, metadata=None):
super().__init__(filename, metadata=None)
def newPage(self, width, height):
""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L769 """
self.output ('PdfFile.newPage')
super().newPage(width, height)
def newTextnote(self, text, positionRect=[-100, -100, 0, 0]):
""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L798 """
self.output ('PdfFile.newTextnote')
super().newTextnote(text, positionRect)
and that's giving me output like
python3 mpl1.py
b'%PDF-1.4'
b'%\xac\xdc \xab\xba'
βββββββββββββββββββββββ
β PdfFile.writeObject β
βββββββββββββββββββββββ
b'1 0 obj'
b'<< /Type /Catalog /Pages 2 0 R >>'
b'endobj'
βββββββββββββββββββββββ
β PdfFile.writeObject β
βββββββββββββββββββββββ
b'8 0 obj'
b'<< /Font 3 0 R /XObject 7 0 R /ExtGState 4 0 R /Pattern 5 0 R'
b'/Shading 6 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >>'
b'endobj'
βββββββββββββββββββ
β PdfFile.newPage β
βββββββββββββββββββ
βββββββββββββββββββββ
β PdfFile.endStream β
βββββββββββββββββββββ
βββββββββββββββββββββββ
β PdfFile.writeObject β
βββββββββββββββββββββββ
b'11 0 obj'
b'<< /Type /Page /Parent 2 0 R /Resources 8 0 R'
b'/MediaBox [ 0 0 460.8 345.6 ] /Contents 9 0 R /Annots 10 0 R >>'
b'endobj'
βββββββββββββββββββββββ
β PdfFile.beginStream β
βββββββββββββββββββββββ
b'9 0 obj'
b'<< /Length 12 0 R >>'
b'stream'
b'/DeviceRGB CS'
b'/DeviceRGB cs'
b'1 j'
b'1 g 0 j 0 w 1 G 1 g'
βββββββββββββββββββββ
β PdfFile.writePath β
βββββββββββββββββββββ
b'0 0 m'
b'460.8 0 l'
b'460.8 345.6 l'
b'0 345.6 l'
b'h'
b''
b'f'
b'/A1 gs 0.9176470588 0.9176470588 0.9490196078 rg 0 G 0.9176470588'
b'0.9176470588 0.9490196078 rg'
βββββββββββββββββββββ
β PdfFile.writePath β
βββββββββββββββββββββ
b'57.6 38.016 m'
b'414.72 38.016 l'
b'414.72 304.128 l'
b'57.6 304.128 l'
b'h'
b''
b'f'
b'q 57.6 38.016 357.12 266.112 re W n /A2 gs 1 J 1 j 0.8 w 1 G /DeviceRGB cs'
and I can start to see where the figure is represented in the PDF.
I think I need to look at the SVG code next because I presume that the vector lines etc in the SVG become pdf drawing commands in the same way so... if I can see what that code does to say "insert these drawing commands here and magically FPDF2 shall find and incorporate them" (paths?) then I should be a bit further to something that says
pdf = FPDF() pdf.add_mpl_figure (fig, w,h)
so it behaves like an svg or a png or whatever and can be put in table cells etc.
i'm thinking that FPDF will need / want some sort of PDF object (basically PdfFile without the file generation) that can be queried to say - give me your images and your font usage and your paths a bit like
for paths in pdf_obj.paths(): add in some clever fashion.
Current code attached.
Iain mpl1.txt
Got this working to proof of concept level at least. After playing around with trying to reconstruct the pdf from the innards of the renderer (and at least getting something on screen), decided that the matplotlib pdf backend is perfectly capable of generating pdf content so...
subclassed PdfFile, captured output to a BytesIO and nabbed everything between stream and endstream and put it into FPDF using _out().
Fonts were a bit harder as the reference in the stream has to match what FPDF is adding so replaced fontname.
It works in as far as I get my test MatPlotLib figure in a FPDF page at standard zoom and have embedded a vector graphic.
Needs work on scaling and positioning (to use in something like a cell) and Truetype fonts but, as a proof of concept, I'm happy with it so far.
Iain
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
#from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
from matplotlib.patches import Circle
from matplotlib.patheffects import withStroke
from matplotlib.ticker import AutoMinorLocator, MultipleLocator
import matplotlib.ticker as ticker
import matplotlib.dates as mdates
import seaborn as sns
from io import BytesIO
from fpdf import FPDF, drawing
import logging
# For PDF export
from matplotlib.backends.backend_pdf import PdfPages, PdfFile, pdfRepr, _fill,FigureCanvasPdf, RendererPdf, Op
from matplotlib import cbook, _path
from matplotlib._pylab_helpers import Gcf
from matplotlib.backends.backend_mixed import MixedModeRenderer
from matplotlib.font_manager import fontManager as _fontManager, FontProperties
from pathlib import Path
#----------------------------------------------------------------------------------------
def W (txt):
""" Wrap a string in a box using ascii line drawing characters - easier to see """
s = '\u2500' * (len (txt) + 2)
print (f"\u250c{s}\u2510\n\u2502 {txt} \u2502\n\u2514{s}\u2518")
#----------------------------------------------------------------------------------------
#----------------------------------------------------------------------------------------
def Draw_BP_Graph ():
""" Draw simple floating bar graph of Blood Pressure """
BP_data = [
('21/3/2005',142, 86),('13/2/2010', 131, 87),('2/6/2011', 141, 83),('27/2/2013', 180, 93),
('1/5/2017', 137, 65),('12/11/2018',151,68),('14/5/2022',155, 86)
]
# Create the dataframe
BP = pd.DataFrame (BP_data, columns=['When', 'Systolic', 'Diastolic'])
# Convert dates in the When column - lose the time component
BP['When'] = pd.to_datetime(BP['When'], dayfirst=True).dt.date
# For a floating bar graph we need a height (systolic - diastolic) as the bar starts at diastolic and has a height
BP['Height'] = BP['Systolic'] - BP['Diastolic']
# Graph Blood Pressure - label things
plt.title('Blood Pressure', fontsize=10)
# plt.xlabel('Year', fontsize=14)
# plt.ylabel('mm Hg', fontsize=14)
# Plot bars from diastolic up to systolic in blue
plt.bar (BP['When'], BP['Height'], bottom=BP['Diastolic'], width=40, color='blue')
plt.grid(True)
# Add lines at 140 & 90 in red - styles as per https://matplotlib.org/3.5.0/api/_as_gen/matplotlib.pyplot.axhline.html (: is subtle)
ax = plt.gca()
for y in (140,90): ax.axhline(y, color='red', linestyle=':')
# Shift the y-axis down by 15 (looks prettier) and up by the same
bottom, top = plt.ylim() # return the current ylim
plt.ylim((bottom-15, top+15)) # set the ylim to bottom, top
# Return the figure
return plt.gcf()
#----------------------------------------------------------------------------------------
class Custom_FPDF(FPDF):
def MPL_Figure (self, fig):
""" Try and save an MatPlotLib figure to a FPDF instance """
fig.dpi = 72 # there are 72 pdf points to an inch
width, height = fig.get_size_inches()
# pdf_file is our in memory PDF generated'ish by MatPlotLib
data = BytesIO()
pdf_file = Pdf_Object(data,parent=self)
# Have to figure out how to alter both position and size
pdf_file.newPage(width,height)
renderer = RendererPdf(pdf_file, fig.dpi, height, width)
#renderer = MixedModeRenderer(fig, width, height, fig.dpi,renderer,bbox_inches_restore=bbox_inches_restore)
renderer = MixedModeRenderer(fig, width, height, fig.dpi, renderer)
fig.draw(renderer)
renderer.finalize()
pdf_file.finalize()
# And the same for the XRef table - we may want to grab things from here
#for i,x in enumerate(pdf_file.XRef()): print (f'Xref[{i}]: {x}')
# Get the in memory PDF
dv = data.getvalue()
# Debug
#for line in dv.split(b"\n"): print (line)
# Look for output between b'stream' and b'endstream'
idx1 = dv.find(b'stream')
idx2 = dv.find(b'endstream')
# and write that wholesale and unmodified into a FPDF page
self._out (dv[idx1+7:idx2])
#----------------------------------------------------------------------------------------
# class RendererPdf2(RendererPdf):
# _afm_font_dir = cbook._get_data_path("fonts/pdfcorefonts")
# _use_afm_rc_name = "pdf.use14corefonts"
# def __init__(self, file, image_dpi, height, width):
# super().__init__(file, image_dpi, height, width)
# self.file = file
# self.gc = self.new_gc()
# self.image_dpi = image_dpi
# def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
# print (f'draw_text: {s} @ {x},{y} - {prop}')
# super().draw_text(gc, x, y, s, prop, angle, ismath, mtext)
#----------------------------------------------------------------------------------------
class Pdf_Object (PdfFile):
"""
For now, we generate a PDF in memory and re-use anything between stream and endstream labels
and can see a MatPlotLib figure rendered in an FPDF page. We need to sort font references
next and if that works we can remove PDF building blocks we will never use.
"""
def __init__ (self, filename, metadata=None, parent=None ):
super().__init__(filename, metadata=None)
self.parent = parent
def XRef(self):
return self.xrefTable
def fontName(self, fontprop):
"""
Font names used in the rendered MatPlotLib Figure are references to a font table (key in a dictionary)
e.g. sans\-serif:style=normal:variant=normal:weight=normal:stretch=normal:size=10.0 is "/F1"
----
The generated figure->pdf has to reference the font name used internal to FPDF rather than the one
from the MatPlotLib pdf rendering backend
"""
print (f'FontProp: {fontprop}')
# TTF? Needs work
if isinstance(fontprop, str):
self.parent.add_font(fname=fontprop)
for k,v in self.parent.fonts.items():
if str(v['ttffile']) == fontprop:
self.parent.set_font ('arial', size=10.0)
return (v['fontkey'])
# Built in
elif isinstance(fontprop, FontProperties):
self.parent.set_font(fontprop.get_name(), size=fontprop.get_size())
return self.parent.current_font['i']
# def newPage(self, width, height):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L769 """
# self.output ('PdfFile.newPage')
# super().newPage(width, height)
# def newTextnote(self, text, positionRect=[-100, -100, 0, 0]):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L798 """
# self.output ('PdfFile.newTextnote')
# super().newTextnote(text, positionRect)
# def finalize(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L823 """
# self.output ('PdfFile.finalize')
# super().finalize ()
# def close(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L856 """
# self.output ('PdfFile.close')
# super().close()
# def beginStream(self, id, len, extra=None, png=None):
""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L877 """
# self.output ('PdfFile.beginStream')
# super().beginStream (id, len, extra=None, png=None)
# def endStream(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L881 """
# self.output ('PdfFile.endStream')
# super().endStream()
# def fontName(self, fontprop):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L895 """
# self.output ('PdfFile.fontName')
# super().fontName (fontprop)
# def dviFontName(self, dvifont):
""" https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L926 """
# self.output ('PdfFile.dviFontName')
# super().dviFontName (dvifont)
# def writeFonts(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L956 """
# self.output ('PdfFile.writeFonts')
# super().writeFonts ()
# def _write_afm_font(self, filename):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L977 """
# self.output ('PdfFile._write_afm_font')
# super()._write_afm_font (filename)
# def _embedTeXFont(self, fontinfo):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L989 """
# self.output ('PdfFile._embedTeXFont')
# super()._embedTeXFont (fontinfo)
# def createType1Descriptor(self, t1font, fontfile):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1047 """
# self.output ('PdfFile.createType1Descriptor')
# super().createType1Descriptor (fontinfo)
# def embedTTF(self, filename, characters):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1138 """
# self.output ('PdfFile.embedTTF')
# super().embedTTF (filename, characters)
#
# def writeExtGSTates(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1520 """
# self.output ('PdfFile.writeExtGSTates')
# super().writeExtGSTates ()
# def _write_soft_mask_groups(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1529 """
# self.output ('PdfFile._write_soft_mask_groups')
# super()._write_soft_mask_groups ()
# def writeHatches(self):
## """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1553 """
# self.output ('PdfFile.writeHatches')
# super().writeHatches ()
# def writeGouraudTriangles(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1614 """
# self.output ('PdfFile.writeGouraudTriangles')
# super().writeGouraudTriangles ()
#
# def _writePng(self, img):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1693 """
# self.output ('PdfFile._writePng')
# super()._writePng (img)
# def _writeImg(self, data, id, smask=None):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1722 """
# self.output ('PdfFile._writeImg')
# super()._writePng (data, id, smask)
# def writeImages(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1781 """
# self.output ('PdfFile.writeImages')
# super().writeImages ()
# def writeMarkers(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1820 """
# self.output ('PdfFile.writeMarkers')
# super().writeMarkers ()
# def writePathCollectionTemplates(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1850 """
# self.output ('PdfFile.writePathCollectionTemplates')
# super().writePathCollectionTemplates ()
# def writePath(self, path, transform, clip=False, sketch=None):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1880 """
# self.output ('PdfFile.writePath')
# if clip:
# #print ('Clip')
# clip = (0.0, 0.0, self.width * 72, self.height * 72)
# simplify = path.should_simplify
# else:
# #print ('No Clip')
# clip = None
# simplify = False
#
# cmds = self.pathOperations(path, transform, clip, simplify=simplify, sketch=sketch)
# self.output(*cmds)
#
# # Return the pdf draw command
# return (cmds)
# super().writePath (path, transform, clip, sketch)
# def writeObject(self, object, contents):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1905 """
# self.output ('PdfFile.writeObject')
# super().writeObject (object, contents)
# def writeXref(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1909 """
# self.output ('PdfFile.writeXref')
# super().writeXref ()
#
# def writeInfoDict(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1922 """
# self.output ('PdfFile.writeInfoDict')
# super().writeInfoDict ()
# def writeTrailer(self):
# """ https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#L1928 """
# self.output ('PdfFile.writeTrailer')
# super().writeTrailer ()
# def savefig(self, figure=None, **kwargs):
# """ Based on https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/backends/backend_pdf.py#LL2724C1-L2745C57 """
# if not isinstance(figure, Figure):
# if figure is None: manager = Gcf.get_active()
# else: manager = Gcf.get_fig_manager(figure)
#
# if manager is None: raise ValueError(f"No figure {figure}")
#
# figure = manager.canvas.figure
#
# # Force use of pdf backend, as PdfPages is tightly coupled with it.
# with cbook._setattr_cm(figure, canvas=FigureCanvasPdf2(figure)): figure.savefig(self, format="pdf", **kwargs)
#
# def finalize(self, pdf):
# self.output ('PdfFile.finalize')
# super().finalize()
#----------------------------------------------------------------------------------------
def main():
# Set Seaborn plot style
sns.set_style("dark")
# Hide a bunch of missing font messages (xkcd graph)
logging.getLogger('matplotlib.font_manager').setLevel(logging.ERROR)
# Switch off compression and simplify fonts
mpl.rcParams['pdf.compression'] = False
mpl.rcParams['pdf.use14corefonts'] = True
# Simple 1 page PDF
pdf = Custom_FPDF()
pdf.add_page()
pdf.set_draw_color (0,0,0)
#pdf.set_line_width(20)
# Crudely hacked out of MatPlotLib multipage PDF
fig = Draw_BP_Graph()
# Output thefigure using MatPlotLib
with PdfPages('MatPlotLib_Output.pdf') as mpdf: mpdf.savefig ()
# Protype FPDF extension
pdf.MPL_Figure (fig)
# Output what we've got into FPDF2 so far
pdf.output ('FPDF_Output.pdf')
#----------------------------------------------------------------------------------------
# Main runtime entry point
if __name__ == "__main__": main()
Hi @LandyQuack!
Sorry for the delay, I have been a bit busy over the last 2 weeks.
Currently, when trying to run your latest script, I get this error:
File "./issue_789c.py", line 163, in fontName
self.parent.set_font(fontprop.get_name(), size=fontprop.get_size())
...
fpdf.errors.FPDFException: Undefined font: dejavu sans - Use built-in fonts or FPDF.add_font() beforehand
But I was able to solve this error by simply adding pdf.add_font("dejavu sans", fname="test/fonts/DejaVuSans.ttf")
in main()
The resulting PDF is promising, but I see zero visible text. There might still be something wrong regarding font management.
Apart from that, I looked at the Custom_FPDF.MPL_Figure()
method & Pdf_Object
class you wrote.
Dumping the whole content stream to FPDF._out()
is very "raw"...
Providing another implementations of the matplotlib.backends.backend_pdf.GraphicsContextPdf.commands
could be a cleaner approach... There are only 9 commands there, that could all be implemented with calls to FPDF
methods.
Have you considered this option?
Also, what is your end goal?
Would you like to contribute code to fpdf2
?
If so, I will be relatively strict on the code quality if you want to add public methods to the fpdf
package, but this can be a very good learning exercice π
On the other hand, an autonomous script could be provided as part of our docs/
(maybe in https://pyfpdf.github.io/fpdf2/Maths.html?), and I would be less strict on the code quality then, as long as it's relatively short.
And finally, you of course choose not to share your code in fpdf2
, which is totally fine π
. In that case I'm still available to answer your questions, and just hope the solution you found solved your initial need!
Hi Lucas - no worries at all at the delay.
Agree re "raw"ness of that approach - was more to get a handle on what was happening where in the code. Have done much as you suggest but subclassed PdfFile because it seemed easier to start with something which worked and then add diagnostics as and where I needed.
So... where is the code up to?
# Our FPDF version
fpdf = MPL_FPDF()
print ('FPDF')
fpdf.add_font(fname='/Library/Fonts/Microsoft/Times New Roman.ttf')
fpdf.add_font(fname='/Library/Fonts/Microsoft/Arial.ttf')
fpdf.add_font(family='dejavu sans mono', fname='/Users/iain/Library/Fonts/DejaVuSansMono.ttf')
fpdf.add_font(fname='/System/Library/Fonts/Supplemental/Courier New.ttf')
#fpdf.set_font("Arial", size=10)
for fig in figs:
f = fig()
fpdf.add_page()
fpdf.savefig (figure=f, bbox_inches='tight')
plt.close(f)
# Output what we've got into FPDF2 so far
fpdf.output ('Output_FPDF.pdf')
ends up in
class MPL_FPDF(FPDF):
#----------------------------------------------------------------------------------------
def savefig(self, figure=None, **kwargs):
if not isinstance(figure, Figure):
if figure is None: manager = Gcf.get_active()
else: manager = Gcf.get_fig_manager(figure)
if manager is None: raise ValueError(f"No figure {figure}")
figure = manager.canvas.figure
# Fpdf uses top left origin, matplotlib bottom left so... fix Y axis
ax = figure.gca()
ax.set_ylim(ax.get_ylim()[::-1])
ax.xaxis.tick_top()
# Fix title position
mpl.rcParams['axes.titley'] = -0.1
# Force use of pdf backend, as PdfPages is tightly coupled with it.
with cbook._setattr_cm(figure, canvas=FigureCanvasPdf2(figure, parent=self)):
figure.savefig(self, format="pdf", **kwargs)
and taking draw_text as an example
def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
#print (f'draw_text: {s} @ {x},{y} - {prop} @ {angle} degrees')
if isinstance(prop, str):
self.parent.add_font(fname=prop)
for k,v in self.parent.fonts.items():
if str(v['ttffile']) == prop:
print (f'Font: prop')
self.parent.set_font('Arial', size=10.0)
# Built in
elif isinstance(prop, FontProperties):
self.parent.set_font(prop.get_name(), size=prop.get_size())
x,y = self._trans.transform ((x,y))
self.parent.text(x,y,s)
with self.parent.text being fpdf.text
So... I can draw a number of basic / standard matplotlib figures directly into fpdf :-) Fonts work but need to sort rotated text yet.
I need to (a) make it not subclass the existing pdf renderer from MatPlotLib because I don't think it needs to (b) figure out how to fit the resulting output into an FPDF container (say a table cell) - more below (c) figure out why the anatomy path with the markers on doesn't draw in MPL but does in my code and (d) do proper circles (think I just need to tell the renderer that we speak bezier.
This is a screenshot of what my output (non "raw" drawing direct into fpdf using the existing drawing commands looks like. I'm pleased with progress so far.
and for simpler plots it works out of the box and looks like MPL.
As above, need to figure out how to get what I'm generating into the right place / size on the screen. I'm currently doing this:
self._scale = scale # scale = self._parent.epw / (width*self.figure.dpi)
self._origin = (2,2)
# Setup our transform
self._trans = Affine2D().scale(self._scale).translate(*self._origin)
so can size and position where needed but need to see what fpdf actually needs me to do.
End goal... hmm, I'm a medic rather than a coder so for what I need/want it's tediously simple vector graphs in amongst text in a PDF (kinda what being lazy I'd have done with Word). Raster graphics would probably have been fine but the purist in me much prefer nice crisp vectors. I just thought I'd see what I could do in code because I enjoy it. Learned about affine transformations along the way.
If I can make something others can get use out of - even better. I get from the community so if I can give back, seems fair.
There will be 20 more optimal ways of doing some of what I've done so think I'll offer the final working version for someone who knows what they're doing to look at / use in whatever way they see fit :-) This is hobby stuff for me and the rest of life keeps me busy enough to not want to maintain code / debug esoteric corner cases.
Current code base attached.
Iain
Interesting discovery last night - mpl.use
pdf = FPDF()
pdf.set_font('Times')
# Use our custom renderer
mpl.use("module://fpdf_renderer")
pdf.add_font(family='dejavu sans mono', fname='/Users/iain/Library/Fonts/DejaVuSansMono.ttf')
for fig in figs:
f = fig()
origin = (20,100)
scale = 0.3
pdf.add_page()
f.savefig (fname=None, fpdf=pdf, origin=origin, scale=scale, bbox_inches='tight')
plt.close(f)
# Output what we've got into FPDF
pdf.output ('Output_FPDF.pdf')
where fpdf_renderer.py looks like the code below. Needs quite a bit of work yet but got text and a grid in a pdf.
"""
Based on https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/backends/backend_template.py
Just need to tell MatPlotLib to use this renderer and then do fig.savefig.
"""
from matplotlib import _api
from matplotlib._pylab_helpers import Gcf
from matplotlib.backend_bases import (FigureCanvasBase, FigureManagerBase, GraphicsContextBase, RendererBase)
from matplotlib.figure import Figure
from matplotlib.transforms import Affine2D
import matplotlib as mpl
class RendererTemplate(RendererBase):
""" Removed draw_markers, draw_path_collection and draw_quad_mesh - all optional, we can add later """
def __init__(self, dpi, fpdf, transform):
super().__init__()
self.dpi = dpi
print (f'FPDF: {fpdf}')
self._fpdf = fpdf
self._trans = transform
# some safe defaults
if fpdf:
fpdf.set_draw_color(0,0,0)
fpdf.set_fill_color(255,0,0)
#
def draw_path(self, gc, path, transform, rgbFace=None):
#self.check_gc(gc, rgbFace)
gc.paint()
# Unzip the path segments into 2 arrays - commands and vertices, the transform sorts scaling and positioning
tran = transform + self._trans
c,v = zip(*[(c,v.tolist()) for v,c in path.iter_segments(transform=tran)])
p = self._fpdf
with p.local_context():
if rgbFace: p.set_draw_color (rgbFace[:3])
#p.set_line_width (gc._linewidth*self._scale)
match c:
# Polygon - starts with moveto, end with closepoly - DF means draw and fill
case [path.MOVETO, *_, path.CLOSEPOLY]:
p.polygon(v[:-1],style="DF")
# Simple line
case [path.MOVETO, path.LINETO]:
p.polyline(v)
# Polyline - move then a set of lines
case [path.MOVETO, *mid, path.LINETO] if all(e == path.LINETO for e in mid):
p.polyline (v)
case _:
print (f'draw_path: Unmatched {c}')
def draw_image(self, gc, x, y, im):
pass
def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
print (f'[{x},{y}] {s}')
x,y = self._trans.transform ((x,y))
self._fpdf.text(x,y,s)
def flipy(self):
return True
def get_canvas_width_height(self):
return 100, 100
def get_text_width_height_descent(self, s, prop, ismath):
return 1, 1, 1
def new_gc(self):
return GraphicsContextTemplate()
def points_to_pixels(self, points):
# if backend doesn't have dpi, e.g., postscript or svg
return points
# elif backend assumes a value for pixels_per_inch
# return points/72.0 * self.dpi.get() * pixels_per_inch/72.0
# else
# return points/72.0 * self.dpi.get()
class GraphicsContextTemplate(GraphicsContextBase):
"""
The graphics context provides the color, line styles, etc. See the cairo
and postscript backends for examples of mapping the graphics context
attributes (cap styles, join styles, line widths, colors) to a particular
backend. In cairo this is done by wrapping a cairo.Context object and
forwarding the appropriate calls to it using a dictionary mapping styles
to gdk constants. In Postscript, all the work is done by the renderer,
mapping line styles to postscript calls.
If it's more appropriate to do the mapping at the renderer level (as in
the postscript backend), you don't need to override any of the GC methods.
If it's more appropriate to wrap an instance (as in the cairo backend) and
do the mapping here, you'll need to override several of the setter
methods.
The base GraphicsContext stores colors as an RGB tuple on the unit
interval, e.g., (0.5, 0.0, 1.0). You may need to map this to colors
appropriate for your backend.
"""
########################################################################
#
# The following functions and classes are for pyplot and implement
# window/figure managers, etc.
#
########################################################################
class FigureManagerTemplate(FigureManagerBase):
"""
Helper class for pyplot mode, wraps everything up into a neat bundle.
For non-interactive backends, the base class is sufficient. For
interactive backends, see the documentation of the `.FigureManagerBase`
class for the list of methods that can/should be overridden.
"""
class FigureCanvasTemplate(FigureCanvasBase):
"""
The canvas the figure renders into. Calls the draw and print fig
methods, creates the renderers, etc.
Note: GUI templates will want to connect events for button presses,
mouse movements and key presses to functions that call the base
class methods button_press_event, button_release_event,
motion_notify_event, key_press_event, and key_release_event. See the
implementations of the interactive backends for examples.
Attributes
----------
figure : `matplotlib.figure.Figure`
A high-level Figure instance
"""
# The instantiated manager class. For further customization,
# ``FigureManager.create_with_canvas`` can also be overridden; see the
# wx-based backends for an example.
manager_class = FigureManagerTemplate
def draw(self):
"""
Draw the figure using the renderer.
It is important that this method actually walk the artist tree
even if not output is produced because this will trigger
deferred work (like computing limits auto-limits and tick
values) that users may want access to before saving to disk.
"""
print (f'Draw: {self._fpdf}')
renderer = RendererTemplate(self.figure.dpi, self._fpdf, self._trans)
self.figure.draw(renderer)
# You should provide a print_xxx function for every file format
# you can write.
# If the file type is not in the base set of filetypes,
# you should add it to the class-scope filetypes dictionary as follows:
filetypes = {**FigureCanvasBase.filetypes, 'fpdf': 'My magic FPDF format'}
def print_fpdf(self, filename, **kwargs):
self._fpdf = self._trans = origin = scale = None
# if not isinstance(self.figure, Figure):
# if self.figure is None: manager = Gcf.get_active()
# else: manager = Gcf.get_fig_manager(figure)
# if manager is None: raise ValueError(f"No figure {self.figure}")
# figure = manager.canvas.figure
# Fpdf uses top left origin, matplotlib bottom left so... fix Y axis
ax = self.figure.gca()
ax.set_ylim(ax.get_ylim()[::-1])
# We pass scale, origin and a handle to the fpdpf instance through here
for k,v in kwargs.items():
match (k):
case 'fpdf': self._fpdf = v
case 'origin': origin = v
case 'scale': scale = v
case _:
print (f'Unrecognised keyword {k} -> {v}')
# Build our transformation do scale and offset for whole figure
if origin and scale:
print ('Transform')
self._trans = Affine2D().scale(scale).translate(*origin)
self.draw()
def get_default_filetype(self):
return 'fpdf'
########################################################################
#
# Now just provide the standard names that backend.__init__ is expecting
#
########################################################################
FigureCanvas = FigureCanvasTemplate
FigureManager = FigureManagerTemplate
Interesting!
You are really performing an in-depth research π
Afternoon - 1 thing which is giving me a little difficulty is the text positioning. It looks like fpdf uses x,y as the origin (bottom left I think) for the text whereas matplotlib is using x,y as the centre of the text I think.
def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
print (f'RendererTemplate.draw_text - {s} at {x:.0f},{y:.0f} at angle {angle:.1f} with prop {prop} - {mtext}')
#print (f'RendererTemplate.draw_text - {s} at {x:.0f},{y:.0f} - {mtext}')
if isinstance(prop, str):
raise ValueError (f'draw_text.prop is a string ({prop}) - add code to add font')
# We're expecting a FontProperties instance
elif isinstance(prop, FontProperties):
g_fpdf.set_font(prop.get_name(), size=prop.get_size())
# Transform our data point
x,y = g_ttrans.transform ((x,y))
#print (f'[{x:.0f},{y:.0f}] {s}')
# Get text width to sort positioning - MPL centers on co-ordinate
tw = g_fpdf.get_string_width(s)
match angle:
case 0:
x -= (tw/2)
g_fpdf.text(x,y,s)
case 90 | 90.0:
print (f'Rotate1 to "{angle}" {type(angle)}')
y += (tw/2)
with g_fpdf.rotation(angle=angle, x=x, y=y):
g_fpdf.text(x,y,s)
case _:
print (f'Rotate to "{angle}" {type(angle)}')
with g_fpdf.rotation(angle=angle, x=x, y=y):
g_fpdf.text(x,y,s)
works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?
Hi @LandyQuack!
Are you still playing with this? π
works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?
fpdf2
does not have a get_string_height
function, but it's usually the opposite:
when users call FPDF.cell()
/ FPDF.multi_cell()
/ FPDF.write()
, they provide a a h=
parameter defining the line height.
Hi Lucas
Yes - still playing but some holiday, a long list of house jobs and scanning old photos got in the way.
Generally what Iβve got works well other than Legends which get all scrunched up. Text positioning looks ok I think now but will try your suggestion.
So.. as things stand - I can render a range of graphs convincingly in a variably sized/positioned box on a page. I havenβt looked at Beziers yet as not had need and Legends donβt work properly.
Havenβt yet looked at putting a graph in a table cell (where I want to take it next) but I donβt imagine that will be hard.
Will add to my task list to tidy up existing code and share as others may spot obvious things Iβm missing or find graphs which break what Iβve tried so far.
Iain
On 2 Aug 2023, at 11:34, Lucas Cimon @.***> wrote:
Hi @LandyQuack https://github.com/LandyQuack!
Are you still playing with this? π
works reasonably but I couldn't see an equivalent to fpdf.get_string_width to give either a height or a bounding box. Am I just missing it or is it something obvious like font size 14 is a standard measurement tall?
fpdf2 does not have a get_string_height function, but it's usually the opposite: when users call FPDF.cell() / FPDF.multi_cell() / FPDF.write(), they provide a a h= parameter defining the line height.
β Reply to this email directly, view it on GitHub https://github.com/PyFPDF/fpdf2/issues/789#issuecomment-1661966945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5WGOCWRZQYPSQXV2BVYCLXTIULRANCNFSM6AAAAAAYJJZUDU. You are receiving this because you were mentioned.
Thank you for the update Iain
Take your time, and enjoy the summer / your holidays π
I'll be happy to give you some feedbacks if you at some point you want to submit a PR
Firstly - excellent library / thank you for all your hard work. Used it vs alternatives because of vector graphics support but was really surprised by (slow) speed on some matplotlib images (savefig to BytesIO as SVG to pdf.image) and wondered if MatPlotLib direct conversion was much faster - it is.
As an example (attached script reproduces), 3 matplotlib plots (1 of blood pressure, 1 the anatomy example and 1 an xkcd example) having timings like:
Generate figures: 102.45670797303319 ms <-- all 3
----------------------------------------------------------------------------------------
MatPlotLib PdfPages - fig 0: 33.42100000008941 ms <-- Blood Pressure MatPlotLib PdfPages - fig 1: 89.3275830312632 ms <-- Anatomy MatPlotLib PdfPages - fig 2: 38.10716699808836 ms <-- xkcd MatPlotLib PdfPages - overall: 160.90095799881965 ms
----------------------------------------------------------------------------------------
Fpdf - fig 0: 276.9010409829207 ms <-- Blood Pressure Fpdf - fig 1: 4885.383540997282 ms <-- Anatomy Fpdf - fig 2: 646.598165971227 ms <-- xkcd Fpdf - overall: 5808.975291030947 ms
----------------------------------------------------------------------------------------
So nearly 6,000 ms for Fpdf2 for 3 plots versus 160 ms for MatPlotLib to produce essentially the same PDF. Size wise they're within 1k of each other.
Not sure how much of this is figure -> svg -> pdf vs figure -> pdf and how much is C vs Python but I started looking because a document with ~ 20 plots in Fpdf2 was taking a surprisingly long time to generate.
My question is about whether or not a feature might be considered to implement fpdf.savefig() or similar - perhaps by nabbing images direct from figure -> pdf -> fpdf2?
test1.txt