sciurius / perl-SVGPDF

SVG handler for PDF
Other
1 stars 0 forks source link

Questions about font callback #6

Closed PhilterPaper closed 3 months ago

PhilterPaper commented 7 months ago

As SVGPDF seems to have settled down quite a bit, I'm starting design work on implementing SVG image processing in PDF::Builder, and have some questions about SVGPDF, especially in regards to the font definition callback. I want to be sure I understand your implementation before I go off and do something that will break it!

I want to make use of Builder's FontManager wherever I can, and am not really concerned what happens in PDF::API2 (but obviously I can't request any changes that would break API2). I'm hoping to come up with a nice neat package of a new image_svg() and being able to pass the resulting $xof array to the normal $grfx->image() or ->object() and handle it appropriately there (new code within to handle a new XObject type or array ref of hashes). Perhaps I can wrap an object around the array, so standard image methods such as width() will work. See (13) for more.

  1. Apparently I need to call new() for each SVG image, rather than just once, with multiple process() calls. Is that intentional, or a bug? It doesn't make all that much difference if I have to start each image with a new() (it will be hidden within image_svg()), but it wasn't documented.

  2. If I understand the POD, examples, and tests correctly; when I do SVGPDF->new() without a font callback, it will use your internal code to try to determine if the requested font is a core font (plus certain aliases, plus some undocumented music fonts... needs to be better documented), and will use that if found. Does it look elsewhere for any fonts (such as along a font path), and are they (a "font-family") assumed to be core fonts, and between the family, weight, and style it comes up with the core font name? Otherwise, use core font Times-Roman size 12 as a fallback. If I only want to treat non-core fonts (as special cases) and fall through to the default in SVGPDF to handle core fonts, is that any problem? I probably will want to handle core fonts in the callback anyway, using FontManager, so that they will be globally cached. The callback seems to be consistent with Builder's FontManager, in that it expects a 'face' (e.g., Times), with separate style (italic on/off) and weight (bold on/off), rather than just a name like TimesBoldItalic. I shouldn't expect to ever see a font-family of "TimesBoldItalic" coming from the SVG, should I?

  3. In one of the examples (fontconfig.pl), I see mention of a fontcache. Is this unique to this example, or is it something global in SVGPDF? Builder's FontManager already caches fonts globally, so once you have created a font from OldeEnglishBlackLetter.otf, there is only one copy around that gets reused as requested. Is the same thing going on only in this example?

  4. What is the relationship between fontsize in the constructor (new) and process, and font-size passed in $style to the callback (presumably from the SVG source)? Is the former just to establish em and ex size, and the latter is to help the user's font callback select a font to open? Does font-size from the SVG ever override the fontsize parameter, or vice-versa? Some clarification would be good. I presume that somewhere, SVG's font-size is being used to set text size. Its availability in the font selection callback might be used how? Maybe someone might choose to use different font files for different ranges of size? If there is no real use for it at this time in the callback, I would just leave it there in case some real use arises.

  5. What is the best way to define and use .ttf files, etc.? If I can use my FontManager to select fonts, that would be great, as it includes both non-core fonts, and caching. Is that the intent of the callback? I only have to give a family name, and bold and italic flags, to select a font that it knows about, of any type. If an SVG provides a different font-family name than is defined to Builder's FontManager, it will have to be "translated" on-the-fly (in the callback) to get a face (font-family) that FontManager will recognize (and is defined). The samples I provided include oddities like "1" (not a real font-family; I didn't understand SVG at the time), "MS Shell Dlg 2" and "Microsoft YaHei" (I have no idea what these are; they came used by go.svg), and GNUplot seems to like using "Arial" (a Windows-only core font). I'm thinking of being able to specify that "1" maps to Helvetica, "Microsoft YaHei" maps to Courier, etc. A different set of SVG files might need an entirely different mapping. A "font-map" entry or user-defined flag in new() that could be passed to the callback would work well; otherwise I need to build new callbacks each time the mapping changes. Don't forget that the 3 additional Windows faces (Georgia, Trebuchet, and Verdana; and GNUplot seems to like Arial, something of a Helvetica clone) need to be substituted on non-Windows systems.

  6. SVGPDF also does something with @font-faces, which could use some clarification in the POD. Is that a font definition in the SVG file, or a mapping to a standard font, or something else? I can't quite tell what it's doing. Also, does SVG allow defining a list of font-family names (like in CSS), and does SVGPDF handle a list properly? This would not be a character-by-character fallback arrangement, but go down the list until a font is found. It might be good to discuss in the POD all the various things in font-family that SVGPDF can handle.

  7. If the user does specify a font selection callback, either as a single entry \&fcsubname (or as an anonymous array [ \&fcsub1, \&fcsub2, \&fcsub3 ], to try until success?), each callback is passed a fixed set of arguments (including $style). If a suitable font is found and opened successfully, it is returned from the callback. Otherwise, should it return undef (is a simple return; sufficient)? If it runs out of user-supplied callbacks without a successful font load, it acts like no callbacks were defined and uses its internal defaults? I just want to be sure that I understand this, and maybe the documentation could use some fleshing out.

  8. Users might supply SVGs with many different oddball font-family entries, and want to change the callback several times in one run, even to the point that a given font-family means different things at different times (dynamically, within a single run). What would be the best way to achieve that? Should they provide an several sub fc1, sub fc2, etc. and then pick the proper one for the new(fc => name)? I'm thinking of when the different callbacks might have a lot of code in common, and just a few differences (perhaps they could internally call a common code module). Would that be cleaner than adding an optional entry to the callback, accepting the $style hashref with a new optional $style->{'font-user'}? Provided by my code, it would be a scalar value that the user could use for any purpose within the callback, such as selecting a different mapping of font-family to real font name (or core). It could even be an arrayref or hashref -- it's up to the user what they want to do with it. If none is provided, the font-user would be undef. new() would be a good place to input such a map or other data to the callback (potentially several in a application), rather than having to supply completely different callback routines. If in fact only one new() is needed for multiple process() calls, it could be passed to process() instead. I would rather not create an entire sub on-the-fly to do different mappings for different SVG files, rather than defining a fixed one, but is there a better way?

  9. I would suggest the heading 'CONSTRUCTOR' be 'CONSTRUCTOR or new()', to minimize confusion among users looking for new. They may be more used to looking for something called new, as that's the Perl convention, as opposed to the class name, the C++ convention.

  10. In the Font Handler Callback, "don't touch [your]$self". :-) yeah, I saw what you did there!

  11. The POD example of using a font callback in the POD only permits 'sans' (Helvetica) or Times-Roman... you might want to mention what it's doing and the lack of bold or italic, just as a caution to users who use this as a basis for their own callback.

  12. The documentation says that rounded corners on rectangles are not supported, yet your sample seems to use it (rx and ry). Please update if they are now supported.

  13. I'm thinking of wrapping an object around the returned $xof array (inside image_svg()), so it can be treated more like the other image objects. However, it sounds like each element of $xof is more or less an independent image (if there is more than one element) -- any thoughts on how best to "combine" them for an overall width(), height(), bbox(), etc.? Perhaps the bounding boxes can be "plotted" and the lowest and highest of x and y for each gives the overall figures? I have no idea if "real life" multielement cases will overlap fully or partially.

sciurius commented 7 months ago

This is going to take a while... Replying in parts.

  1. Rounded rectangles

Supported for a while, and the docs have been fixed recently (https://github.com/sciurius/perl-SVGPDF/commit/daaae0d356c18cbec3276bc55169971fd031f99f).

  1. Apparently I need to call new() for each SVG image, rather than just once, with multiple process()

It should not make a difference. However keep in mind that every call to process adds xobjects to the result. This is correct:

$p = SVGPDF->new;
for ( @svgfiles ) {
    $p->process( $_ );
}
for ( $p->xoforms ) {
    $gfx->object( $_, ... );
}

But this is not:

for ( @svgfiles ) {
    $p->process($_);
    for ( $p->xoforms ) {
        $gfx->object( $_, ... );
    }
}
sciurius commented 7 months ago
  1. I'm thinking of wrapping an object around the returned $xof array [...] any thoughts on how best to "combine" them for an overall width(),

Use the combine option in the process call?

sciurius commented 7 months ago
  1. The POD example of using a font callback in the POD only permits 'sans' (Helvetica) or Times-Roman... you might want to mention what it's doing and the lack of bold or italic, just as a caution to users who use this as a basis for their own callback.

Ok. OTOH users who do not understand this from the example source should probably not write font handler callbacks☺.

sciurius commented 7 months ago
  1. In one of the examples (fontconfig.pl), I see mention of a fontcache. Is this unique to this example, or is it something global in SVGPDF? Builder's FontManager already caches fonts globally, so once you have created a font from OldeEnglishBlackLetter.otf, there is only one copy around that gets reused as requested. Is the same thing going on only in this example?

The built-in FontManager caches the mapping of family+style+weight to the resultant PDF font object. A subsequent request for the same family+style+weight combination will yield the same PDF font object. This prevents the same font from being included multiple times in the PDF file.

Does Builder return the same font object for each $pdf->font("SomeWeirdFont.ttf") call?

sciurius commented 7 months ago
  1. What is the relationship between fontsize in the constructor (new) and process, and font-size passed in $style to the callback (presumably from the SVG source)?

It is only used to set default values for ex and em dimensional units.

PhilterPaper commented 7 months ago

This is going to take a while... Replying in parts.

No problem. I would be surprised if you could answer everything in one go!

However keep in mind that every call to process adds xobjects to the result.

OK, you appear to be saying that I can call new() once, and process() multiple images against it? What I had tried before was like your second (incorrect) example, and got lots of errors. I'll have to experiment with this. I would like to call new() once, and process/display a series of images, but if I'm going to end up packaging new() and process() into image_svg() anyway, it may be a moot point. I don't want to "merge" a bunch of separate images (that I want to display separately) into one array of objects, as the first (correct) example seems to show.

Use the combine option in the process call?

I'll look at it again, but I'm not necessarily looking to merge individual images into one to be displayed in one go, just have a way to implement a single width(), height(), etc. for a set of images from one SVG file. Or is that what combine does, given one {'width'} etc. in one combined object?

Ok. OTOH users who do not understand this from the example source should probably not write font handler callbacks☺.

True enough, but it wouldn't hurt to remind them that the given example is so very limited that they should be cautious about trying to use it as a basis for their code. IOW, "This example callback will behave this way...".

Does Builder return the same font object for each $pdf->font("SomeWeirdFont.ttf") call?

Yes. For a given face, weight, and style it will open a font object, and hang on to it for reuse should the same face, weight, and style be requested later. It sounds like our FontManagers do the same thing. If I use mine, it will also cache any font defined outside of SVGPDF (and vice-versa).

It is only used to set default values for ex and em dimensional units.

If the SVG defines a font-size for some text, is that value used by SVGPDF for any em and ex usage? If SVG specifies text without giving a font-size for some reason, is it using the 12pt default, or will it use the fontsize from new() or process()? I'm just trying to make sure that what font size gets used (and where) is logical and consistent and predictable.

Looking forward to more answers!

sciurius commented 7 months ago

Use the combine option in the process call?

I'll look at it again, but I'm not necessarily looking to merge individual images into one to be displayed in one go,

The combine option is intended to combine the individual images resulting from a single process call.

For example, when processing an ABC score of two 'lines', I get one file with two svg images, one for every score line. I can render these individually, e.g. when the score needs to be broken across a page boundary, or as a single image.

Or is that what combine does, given one {'width'} etc. in one combined object?

Precisely. And it gives a single object with all individual objects combined.

Does Builder return the same font object for each $pdf->font("SomeWeirdFont.ttf") call?

Yes. For a given face, weight, and style it will open a font object, and hang on to it for reuse should the same face, weight, and style be requested later. It sounds like our FontManagers do the same thing. If I use mine, it will also cache any font defined outside of SVGPDF (and vice-versa).

Great. That is exactly what I wanted to achieve.

If the SVG defines a font-size for some text, is that value used by SVGPDF for any em and ex usage?

It should, I think.

If SVG specifies text without giving a font-size for some reason, is it using the 12pt default,

Yes.

I'm just trying to make sure that what font size gets used (and where) is logical and consistent and predictable.

In case of doubt, make a small svg and try that in different browsers. If the browsers agree and SVGPDF differs, file an issue so I can try to repair it.

Looking forward to more answers!

Asking more questions is a good way to get more answers ☺.

PhilterPaper commented 7 months ago

In a given SVG file or string, each (unnested) <svg> creates its own xobject in the output array?

OK, I think I will try the following:

  1. image_svg() takes a callback (if any) and passes it on to new() for one SVG image at a time. It will call process() and return an object (?) wrapping around the array element(s). If an optional 'combine' is specified, it will expect a single element combined object, otherwise an array of xobjects. width() etc. will go through all the xobjects in the array, and combine their {'width'} etc. values. I will have a default callback after any user-defined callback(s) which may only translated a font-family name into something else [I need to test if $style elements can be updated!], or could go ahead and create an image xobject, ending the chain of callbacks. There may be multiple user-defined callbacks (for one new(), or different ones per new(), which could share common code by calling a user-defined routine (not needing any extra action on the part of SVGPDF).
  2. A new method to break a multi-element xobject array into multiple separate xobjects, should a user need to split them across pages. I'll have to think about when this would be useful. Or maybe just a flag to have image_svg() return an array of SVG image objects in the first place?
  3. Standard graphics image() invokes standard object() to output an array of xobjects from SVG. From the programming standpoint, handling an SVG image appears to be just like handling one of the existing pixel formats.

Anything there sound unreasonable?

Asking more questions is a good way to get more answers ☺.

I did ask 6 other questions. Maybe I can now answer them myself:

  1. TL;DR if the callback(s) don't handle the requested font, your fallback code will assume it's a core font? I will still handle all fonts in my callback, so that my FontManager caches them. Only if something invalid is given should it fall through to your default handling (and finally core Times-Roman 12).
  2. My FontManager will know what to do with .ttf/.otf files, so your code should never see one (you expect them to be handled in a callback).
  3. Still could use something in the POD to explain @font-face.
  4. From your silence, I take it that I understand how your code works.
  5. If a user can specify callback(s) that come ahead of mine, and apply to only one SVG input (i.e., can be changed for different SVGs), they can call a common code core if they share a lot of the same code.
  6. Still seems like a good idea to [also] refer to the Constructor as "new()".
sciurius commented 7 months ago

In a given SVG file or string, each (unnested) <svg> creates its own xobject in the output array?

Yes.

1. `image_svg()` takes a callback (if any) and passes it on to `new()` **for one SVG image at a time**. It will call `process()` and return an object (?) wrapping around the array element(s).
   If an optional 'combine' is specified, it will expect a single element combined object, otherwise an array of xobjects. `width()` etc. will go through all the xobjects in the array, and combine their `{'width'}` etc. values.

If you look at process, you'll see it produces an array of individual XObjects. With 'combine' this is then reduced to a single Object with all dimensions combined. You can call process without combine (so you have the array), and then call combine to obtain the combined image.

2. A new method to break a multi-element xobject array into multiple separate xobjects, should a user need to split them across pages. I'll have to think about when this would be useful. Or maybe just a flag to have `image_svg()` return an array of SVG image objects in the first place?

See above.

3. Standard graphics `image()` invokes standard `object()` to output an array of xobjects from SVG. From the programming standpoint, handling an SVG image appears to be just like handling one of the existing pixel formats.

Yes, but its an XForm, not an XImage.

Anything there sound unreasonable?

Not at first sight. My suggestion is to start using it and see how the details work out.

I did ask 6 other questions. Maybe I can now answer them myself:

They're scheduled for answering.

  1. TL;DR if the callback(s) don't handle the requested font, your fallback code will assume it's a core font?

Correct. I'll remove music fonts, they're left over from the past.

  1. My FontManager will know what to do with .ttf/.otf files, so your code should never see one (you expect them to be handled in a callback).

Correct.

  1. Still could use something in the POD to explain @font-face.

@font-face is an evil facility. It can take many forms and do many things. SVGPDF only deals with a small subset.

For a given family/style/weight I first look if a corresponding @font-face exists. If so I use the src part to get at the font. The family can be a list of families, and the source a list of sources. Currently I go for the first matching family name, and the first source that I can understand. I should retry another name if there is no appropriate source but currently I do not. Worse, if there is a unicode-range specified, this whole lookup process must be repeated for each individual character to be processed. This will be painfully slow.

  1. From your silence, I take it that I understand how your code works.

Actually it meant that I didn't get to answering these. But you're doing fine.

  1. If a user can specify callback(s) that come ahead of mine, and apply to only one SVG input (i.e., can be changed for different SVGs), they can call a common code core if they share a lot of the same code.

  2. Still seems like a good idea to [also] refer to the Constructor as "new()".

Ok. Classes can have many constructors that produce objects. new is just one of them.

BREAKING: I've decided to change the API for the fontmanager callback before it is too late. I think it is better to pass the arguments as key/value pairs since that allows adding additional arguments if needed. So old api:

    ( $self, $pdf, $style )

New api:

    ( $self, pdf => $pdf, style => $style )

I'll try to roll out a new release later tonight.

PhilterPaper commented 7 months ago

An idle thought... if you change a value in $style, but do not return a font from the callback, will that changed value been seen by the next callback in the list, or by your fallback code? I'm thinking in terms of a user callback changing a font-family (or weight, or size) to be a standard PDF core font, which your code could then use. Say, an SVG mistakenly uses "Times-Roman" for a font-family; could a callback correct this to "Times", permitting your code to work with it?

sciurius commented 7 months ago

There's a reasonable chance that this will indeed work...

sciurius commented 7 months ago

Two remarks. First, @font-face lookup has already taken place, so if you change e.g. the weight from heavy to bold, it will not match a @font-face with weight bold. Second, after the callbacks, only font names like sans, serif, mono, times are recognized. Not the full set of core font names. (Should I change that?)

PhilterPaper commented 7 months ago

There's a reasonable chance that this will indeed work...

That's good news that a callback could change an invalid/unsupported font-family to something supportable by core fonts (without having to process the font itself). For example, go.svg (and other samples derived from it) includes Microsoft YaHei, which could simply be changed to some reasonable core font. That way, it won't just default to Times-Roman.

First, @font-face lookup has already taken place, so if you change e.g. the weight from heavy to bold, it will not match a @font-face with weight bold.

If @font-face lookup can't/shouldn't be delayed until the callbacks are done, that should at least be documented as a caution. I am not familiar with the use of @font-face, but it sounds like it's expecting certain attributes such as 'bold' (and not 'heavy').

Second, after the callbacks, only font names like sans, serif, mono, times are recognized. Not the full set of core font names. (Should I change that?)

I see 'sans', 'sans-serif', and 'helvetica' (case-insensitive) mapping to various Helvetica flavors; and 'serif' and 'times' mapping to various 'Times' flavors. 'Courier' by itself doesn't appear to be recognized, but 'text', 'mono', and 'mono[-]space' map to it. Symbol and ZapfDingbats seem to be left out in the cold, along with Windows-only font families (Georgia, Trebuchet, Verdana, Webdings, Wingdings). I would be consistent and recognize at least Courier, Symbol, and ZapfDingbats. I don't know if it is reasonable to recognize that you're on a Windows machine and recognize their unique families (the Reader might not be on Windows). Perhaps an option in new() to allow Windows font-family members?

BTW,

IMPORTANT: With the standard corefonts only characters of the ISO-8859-1 set (Latin-1) can be used. No greek, no chinese, no cyrillic. You have been warned.

Actually, any single-byte encoding should be usable, provided you give the encode option* to font(). If you don't already have a means to provide the encoding, perhaps you should add it? I don't think SVG worries about such things, so it would probably be in the new() as an option. Anyway, Greek and Cyrillic should probably then be OK, but CJK will be multibyte and off-limits to core fonts.

* encode appears to have been supported all-along by font(), but missing from its documentation. I will update the POD.

sciurius commented 7 months ago

I would be consistent and recognize at least Courier, Symbol, and ZapfDingbats.

These are the 14 PDF standard fonts:

$pdf->font accepts these 14 and these 14 only as core fonts.

$pdf->corefont accepts these 14. It also accepts Arial, CourierNew, TimesNewRoman (and variants) and Times as alternative names. No problem since this will use the corresponding corefont name.

This method also accepts Georgia, Trebuchet, Verdana (and variants), plus BankGothic, WebDings and WingDings. These fonts are not corefonts. Metrics (subset?) are provided but the actual glyphs will depend on the viewer. Personally, I think this is bad and probably one of the reasons that PDF::API2 advocates $pdf->font in favour of $pdf->corefont.

I don't know if it is reasonable to recognize that you're on a Windows machine and recognize their unique families (the Reader might not be on Windows). Perhaps an option in new() to allow Windows font-family members?

I prefer to stick to the core. I'll augment the recognized fonts with the core fonts.

PhilterPaper commented 7 months ago

Did you put the latest changes in GitHub? I went to look to see if you now supported all 14 core fonts (plus aliases) in FontManager, and it didn't appear to have been updated lately.

BTW, I see that CPAN still points to the RT system for tickets, and there is still an open one for the long double problem (which I think you've fixed). You may want to define the ticket system to be GitHub, and while you're at it (updating the META.* files and maybe Makefile.PL) also pointing to the code repository on GitHub.

It's not critical to me to have SVGPDF support the Windows "core" extensions, so if you want to stick to the official core (by using font()), no problem. I can always define a callback to recognize and open the Windows extensions.

sciurius commented 7 months ago

Latest changes are in GH now.

I see that CPAN still points to the RT system for tickets

Yes, that is a problem I discovered recently. Will be fixed soon.

I can always define a callback to ...

You get the idea ;).

PhilterPaper commented 4 months ago

Back to my number 7 question, your latest POD leaves me a little confused. If the user has specified a single callback, say, sub CB1, you show it as 'fc' => \&CB1. However, you also talk about an array of callbacks -- are they 'fc' => [ &CB1, &CB2 ] (ref() eq 'ARRAY') or some other format? If the user has specified one or more callbacks, I need to be able to insert one (or more) myself, such as for Windows "core" extensions, creating either a single entry (if the user had specified no callbacks), or some sort of array of callbacks (expanding on the original callback(s)). It would be helpful to expand the POD with an example of giving an array of callbacks. I assume that I can then figure out how to detect whether the user gave a single callback or an array.

Just to make sure I understand this, SVGPDF will first try to satisfy a font via font-face, and if not found, visit the callback(s) from the first to the last, until one returns a font object. If none do, it tries to treat it as a standard core font (essentially a built-in callback?). Finally, what happens if a font is still not found -- does it fall back to some default, or does it fail?

I take it that I should never see a font such as Times-BoldItalic, but rather, Times with style 'bold' and 'italic' set (rather than 'normal')? Even Times-Roman appears to show up as Times 'normal', 'normal'. Correct? Can I count on this for any SVG that some random utility has produced? If not, my Windows "core" callback will need to handle all the special cases, as I presume your built-in core handler would.

Do you handle extended style entries for font weights (e.g., 'demi bold' or 500) and slanting that might show up in an SVG, or does SVG restrict entries to just 'normal' and 'italic' and 'bold'? I just want to get some idea of what I may need to handle at some point.

sciurius commented 4 months ago

However, you also talk about an array of callbacks

SVGPDF/FontManager.pm, lines 165-168.

If the user has specified one or more callbacks, I need to be able to insert one (or more) myself,

I assume the user calls PDF::Builder functions, and PDF::Builder calls SVGPDF functions. So you are always inbetween.

Finally, what happens if a font is still not found

SVGPDF/FontManager.pm, lines 223-228

Can I count on this for any SVG that some random utility has produced?

No. SVG (more precisely, CSS) has font-family, font-style, font-weight etc. If the 'random utility' produces font-family="Times-BoldItalic" that is what you are going to see...

Do you handle extended style entries for font weights

No. (Not SVG restriction, I just have no plans to impement this. I'm still hoping for a decent font managing system to be available on all major platforms. Similar to what the major browsers do.)

PhilterPaper commented 4 months ago

OK, so it sounds like I properly understood the array of callbacks. I still think the documentation should be improved with an example. And it sounds like the ultimate fallback font is 'Times' (core), obeying the italic and weight settings, if any.

If the 'random utility' produces font-family="Times-BoldItalic" that is what you are going to see...

OK, I'll have to see about handling any font in a callback that might be a combined family+weight+italic (font file name). Some day.

PhilterPaper commented 3 months ago

PDF::Builder SVG support released (PhilterPaper/Perl-PDF-Builder/issues/89).