sciurius / perl-Text-Layout

Pango style markup formatting for PDF::API2, Markdown, Cairo and more
2 stars 2 forks source link

Roadmap #4

Closed PhilterPaper closed 6 months ago

PhilterPaper commented 4 years ago

Happy holidays, Johan, and everyone else working with Text::Layout!

Do you have any sort of road map outlining where you plan to go with Text::Layout? Is it going to be strictly font selection and limited HTML-style text markup, or do you plan to go further with it? I see hints in the documentation about paragraph shaping and such. As you know from our earlier conversations, I'd really like to see Pango/Harfbuzz support for complex scripts, but that sounds like it might be out of range of Text::Layout (and might be better as a separate package). Also, the installation of such support libraries does not appear to be a simple issue, particularly with Windows and Strawberry Perl.

PhilterPaper commented 4 years ago

@terefang has poked PhilterPaper/Perl-PDF-Builder#56, suggesting the use of unifont() for font fallbacks (a glyph doesn't exist in the current font, but might in a list of alternative fonts). I said that it doesn't appear to me that unifont would be useful for that purpose, but it got me to wondering if Text::Layout could handle it (as an extension). Text::Layout knows the current font in use (as well as attributes like size, bold, italic) and might be extended to look down a list of alternative font families if a glyph is missing from the current one (is this already in the capability list?). It might further be extended to give different font lists (like unifont does) for different ranges of code points (not necessarily just Unicode). Just some thoughts for the Roadmap.

sciurius commented 4 years ago

Good point. I'll put it on the TODO list for Text::Layout.

sciurius commented 4 years ago

As for the other points (your comment of 29 days ago (stupid habit to use 'friendly' dates for timestamping)) I want to add support for paragraph handling at some point in the future (i.e., when I need it). The markup is intended to be Pango compatible so it will remain "limited HTML-style".

PhilterPaper commented 2 years ago

You might want to read this document: http://pdfapi2.sourceforge.net/pdfapi2_for_fun_and_profit_APW2005.pdf . It doesn't say who wrote it, and sounds a bit old (the SF page claims unmaintained since 2008), but it has some interesting "future directions" including Pango-compatible markup. It might be interesting to see if the author made any progress, although the SourceForge page doesn't seem to have links to any of the code mentioned in this document.

terefang commented 2 years ago

lol ... that was a presentation i gave at Austrian Perl Week 2005

PhilterPaper commented 2 years ago

Ah, I hope it can give some ideas to Johan and others!

sciurius commented 2 years ago

Well, it gives the impression that much of the work I spent on Text::Layout was already done for a (hypothetical?) module PDF::Layout...

PhilterPaper commented 2 years ago

I'm still contemplating what direction to take PDF::Builder in, and how Text::Layout might fit in to that.

I think Text::Layout would best fit into this if it can just return a paragraph string split up into substrings, with font (and baseline/leading) information for each substring (face, size, etc.) and not try to do any line splitting or rendering (ink on paper). I would guess that already largely exists somewhere inside of Text::Layout, before it gets to line folding and rendering. One question would be, "how much PDF information does Text::Layout need to know, such as available fonts?"

The ultimate intent is to have input in an HTML/Pango markup (and possibly other markups), along with inline directives, that produces a beautifully formatted document. This may be something that wraps around PDF::Builder, reserving that package for the low-level primitives.

sciurius commented 2 years ago

Decomposing a marked-up string into an array of substrings with information is basically what set_markup does. After calling set_markup you can inspect the _content field of the layout. It should be relatively easy to add a new API call that calculates dimensions and add this to the elements of the _content array. And/or return this information to the caller in a user friendly way.

For the calculation of dimension only Font::TTF is necessary but for practical reasons it is better to use the PDF::API2/Builder routines. And we will stil be limited by the font bounding box information.

PhilterPaper commented 2 years ago

Since Text::Layout already processes SGML (HTML) style markup, I was wondering about how flexible it would be to extend its "language" to the things I listed a few months ago (lists, tables, etc.) as well as custom low-level markup commands. These could include commands to use or suppress ligatures, choose alternative glyphs (e.g., swashes), with the "true" character given somewhere so that the PDF Reader can do searches, etc. Paragraphs, blockquotes, etc. come to mind, as do footnoting and indexing facilities. Equations could be handed off to another package when <eqn> is seen. Other custom commands could include markup engine control such as (re)setting a page number or doing conditional output based on remaining page space. I don't want Text::Layout actually trying to put ink on paper; just acting as a preprocessor of some sort to prepare text for further processing.

This would, of course, be very application-specific beyond simple font changes, and you probably would be reluctant to build-in such things. I would suggest some sort of callback mechanism where the application would register a tag and the callback routine to use (and pass various attribute strings to). That way, your product could be kept clean and lightweight, while still being expandable to handle something approaching full HTML-style markup. It would natively only handle various font-selection actions, leaving heavier stuff to its caller. If an encountered tag isn't registered, and it's not one of your native tags, I guess you could just give an error indication and ignore the tag.

Another thing would be CSS-style class, id, and style attributes, so that actual modern HTML could be used. Since you would need to handle some of this for <span> tags anyway, perhaps Text::Layout should handle parsing a style attribute and even processing classes and ids so that the proper CSS properties could be looked up? That's getting pretty close to a full-fledged HTML browser engine (except for Javascript), so it's understandable if you don't want to go in that direction, but it's something that would be very useful.

If you use angle brackets < and > for "tags", I guess you will need to think about HTML entities such as &lt; and &gt; at a minimum; perhaps a full suite of entities, or at least a way to add a list of them and their Unicode equivalents. I seem to recall seeing such a CPAN package floating around somewhere.

Last Wednesday it was 27C. The latest forecast is for up to 30cm of snow tomorrow night! April really is the cruelest month.

PhilterPaper commented 7 months ago

Per your PR PDF::API2 #76, bookmark https://www.catskilltech.com/Documentation/PDF/Builder/Resource/Font/CoreFont.html#Supported-typefaces .

PhilterPaper commented 6 months ago

Doing a little cleanup today on open tickets. As it doesn't look like I'll be using Text::Layout in PDF::Builder (the column() system has this kind of processing now built in), I'll go ahead and close this ticket. Feel free to re-open it if there's any discussion here that you want to "keep warm" for possible future work.

PhilterPaper commented 6 months ago

Oops, I did say I was going to close it!