In some situations .pn directives generate invalid <span> elements

Wouldn't it be simpler to just generate it in its own
when we encounter the .li?

This is the first solution that came to mind. However, I have a feeling there may be some situation(s) where inserting a

may cause unwanted side effects / breakage in the HTML (middle of a table? ...). I can't really think of a specific example, but since .li is so flexible there must be a case like this.

Something else to consider, IIRC just enclosing a pageno span within a

has a minor side effect of inserting some extra vertical space. In my first PP project I used this technique and vaguely remember there being a small side effect (not 100% sure if/what).

wf49670 commented 10 years ago

On 10/10/2014 7:08 PM, davem2 wrote:

Wouldn't it be simpler to just generate it in its own <div> when
we encounter the .li?
This is the first solution that came to mind. However, I have a feeling there may be some situation(s) where inserting a
or
may cause unwanted side effects / breakage in the HTML (middle of a table? ...). I can't really think of a specific example, but since .li is so flexible there must be a case like this.

You can already have .pn in the middle of a table, and as I recall it works correctly. (At least, it did for the cases I tried previously.)

Something else to consider, IIRC just enclosing a pageno span within a

or
has a minor side effect of inserting some extra vertical space. In my first PP project I used this technique and vaguely remember there being a small side effect (not 100% sure if/what).

That's certainly a consideration, but it should be controllable with appropriate CSS specifications on that div.

If a simple solution using

as we encounter that .pn doesn't work for some reason, I think it would be better to issue a warning message when the .li is encountered, saying that the HTML is unlikely to validate, and suggesting that the PPer move the .pn or the .li.

The idea of post-scanning the HTML to find page-number specifications that are not enclosed, while having to ignore HTML inserted via .li, seems very complex to me, and not worth the effort for a simple case where we could warn the PPer to move a bit of code around.

So if simply using a

doesn't work, I think ppgen should just issue a warning and be done with it.

Walt

ghost commented 10 years ago

I worked on coding this a lot today trying several different approaches. I was unsuccessful.

If the PPer puts a .pn before a literal block, that block might be a paragraph of text. The PPer rightly assumes that the page number will show on the first line of the paragraph, so it needs to go into the paragraph as a span. But if the first thing in the .li isn't a block, then it still needs to go in but as a div. Deciphering what the user has put in the literal block has stymied my efforts today.

I don't want to give up just yet. Just saying "Warning: this isn't going to validate" isn't the solution I'm hoping for.

ghost commented 10 years ago

No success again through the night.

If the user chooses to escape to HTML, then they are ultimately responsible for integrating their code with ppgen's code. That's always been the case. Perhaps it's time to just accept that. This is just one case where that general rule would apply.

For this particular case, I would expect the user to include the page number in their literal block and not have the .pn +1 before it at all. Then, to keep things in sync, the next page break after the literal block the user would use an explicit ".pn 72" or whatever the next page is (or they could do a ".pn +2").

I will put in a warning if ppgen encounters a ".li" when trying to place a page number unless someone jumps in and has a better idea.

wf49670 commented 10 years ago

On 10/11/2014 12:40 AM, rfrank wrote:

If the PPer puts a .pn before a literal block, that block might be a paragraph of text. The PPer rightly assumes that the page number will show on the first line of the paragraph, so it needs to go into the paragraph as a span. But if the first thing in the .li isn't a block, then it still needs to go in but as a div. Deciphering what the user has put in the literal block has stymied my efforts today.

I started to say that if the PPer wanted a paragraph of text then it's not necessary to use a .li block, but then I thought that maybe the PPer wants to style that paragraph of text without needing to figure out the cxxx class name that ppgen will assign to it.

Even worse, the .li might be generating a complex table, and ppgen might need to apply the page number to it.

Here's a different approach. Suppose we just provide a way for the PPer to tell ppgen a safe location to emit the page number span within the .li block. We would need ppgen to scan the code within the .li block, but it would have something specific to look for.

Some possibilities: (1) Rather than placing the .pn command before the .li block, the PPer might place it inside the block, still starting in column 1. The .pn command would then be the only thing that ppgen looks for within the .li block (except .li-, of course), and it would be the PPer's responsibility to ensure that it is placed where it will be properly enclosed by something. After emitting that page-number span, ppgen would continue to scan in case the .li block generates more than one page.

(2) We might allow a new trigger string within a .li block, perhaps something like (with any operand allowed by .pn), which the PPer would place where the number belongs. PPgen would scan the .li block looking for that construct, and emit the page-number span there. As with (1), ppgen would keep scanning in case there are multiple pages generated by the .li block.

Of these, I think (1) is sufficient, because line breaks are not significant in HTML. So if the PPer is generating something such as a paragraph (or even a table cell) and needs to specify the page number at some specific location, he can simply insert a line break, the .pn command, and another line break, and then continue with that paragraph or table cell without any noticeable effect. It should also be simpler code than for (2), and it keeps us with one mechanism for specifying page numbers, rather than introducing a new one.

Walt

ghost commented 10 years ago

Clever, indeed. I would vote against it, though. I don't want the PPer to have to learn anything new here; ppgen is complex enough. I want the PPer to know that their literal block is to contain standard HTML and that they are responsible for getting it right. That would apply here and for all the other situations we have yet to encounter.

Both your solution and mine will generate the same code, so there is no PPV implication. So it goes back to Dave, who has this scenario (and I think one other slightly different, based on his test file) in a real book. I'm not clutching this to my chest; which solution we use is still open. Dave, your thoughts?

ghost commented 10 years ago

On other thought. In your solution, Walt, I think it presumes that there will be a place to put in a pageno span. But it's possible, perhaps even common, for the .li to contain no block elements. So aren't we right back to having to determine programmatically if it's a div or a span?

wf49670 commented 10 years ago

On 10/11/2014 10:07 AM, rfrank wrote:

On other thought. In your solution, Walt, I think it presumes that there will be a place to put in a pageno span. But it's possible, perhaps even common, for the .li to contain no block elements. So aren't we right back to having to determine programmatically if it's a div or a span?

Good point, Roger, but I don't think so.

With my suggestion, I think you would still need the warning if ppgen encounters .pn then a .li block before it has found a spot to place the page-number span. Presumably, then, the PPer will either have to delete the .pn and incorporate the page-number span inside the HTML he's generating, or move either the .pn or the .li block.

What the PPer should have done (if we go with my suggestion) is to move the .pn inside the .li block at the proper spot. Or, if there is no proper spot, then he's back to moving either the .pn or the .li block to eliminate the warning. Note that in this case there is no place for him to put the page-number span within the block, either.

I am somewhat concerned, with your approach, of two things: (1) The PPer needs to learn the format of the page-number spans that ppgen generates, so he can incorporate one within his .li block if that is the proper spot for the page number. Yes, someone using .li blocks for HTML needs to know HTML, but this approach forces him to also know details of what ppgen does, which conceivably might change (as they have in this area once already). So for each new release of ppgen that he uses, the PPer will need to examine the ppgen-created HTML and see what the spans look like. Usually they will be the same as for the last project, but the PPer will need to check.

(2) Conceivably, the PPer will discover that something is wrong with his page numbering somewhere in the book, possibly before the .li block(s) in which he has embedded the page-number span(s). If the PPer corrects that by adjusting his .pn commands, he then has to remember to find the code he incorporated within his .li blocks and adjust it, too. I expect this situation to be rare, but it's a trap just waiting to catch the unwary.

I don't think it's much extra burden for someone who already knows enough HTML that he can use .li blocks to also learn where he can safely place a .pn command. And it does give the PPer one method of setting page numbers (.pn) whether he's inside or outside a .li block. So it's less learning, as I see it.

Walt

windymilla commented 10 years ago

I think I agree with Roger's suggestion that the PPer needs to consider page numbers themselves if they escape to literal HTML. There's a danger and additional complexity if ".li" no longer means "literal".

It is not like previously discussed issues such as trying to override the CSS for a particular item. Then, it was undesirable because you might need to generate an HTML with ppgen, find out the internal class used, then modify it, with a risk of the class name changing when you made subsequent edits earlier in the file and regenerating, etc.

In this case, the PPer knows the page number (from the scan) before they've ever run ppgen and so they can specify it manually, both inside the literal block for as many pages as necessary, and also afterwards to get the page counter back in sync.

I think the situation is that ppgen can already be used by experts to create any HTML they want. What we need to avoid is making it offputting to non-experts. I think as technically knowledgeable PPers we have a greater responsibility to the newcomer and the nervous than we do to people like ourselves.

Nigel

windymilla commented 10 years ago

My post crossed with Walt's.

I can see your point (1), Walt. However, I think having to specify a pagenum inside a literal will be a rare occurrence anyway, unless someone is doing so much literal coding that one might question whether ppgen is the best tool for that project.

Regarding point (2), I don't agree. The PPer will know to specify "page 72" in their literal code by looking at the scan and seeing that it is page 72. The only way page numbering could conceivably go wrong would be if you had the wrong number of ".pn +1" type directives and so ppgen got our of sync with the scans. If that was the case, then you would need to fix it regardless of use of literal, and your page numbers from your literal section onwards would automatically be brought into sync by using either ".pn 73" or ".pn +2" without any further adjustment needed to the literal block.

wf49670 commented 10 years ago

Truth be told, I'm not unhappy with simply warning the PPer, and it was (iirc) one of my early suggestions.

And it does keep things simpler in the ppgen code.

Walt

ghost commented 10 years ago

Great. Then I'll take your suggestion, Walt, and make it a warning.

ghost commented 10 years ago

Done and merged with develop.

wf49670 commented 10 years ago

Hi, Roger,

I've coded an enhancement to ppgen, based on the discussion in the forums from the PPer who wanted ppgen to translate Footnote and Illustration to his book's language. That was handled with ".nr Footnote" and ".nr Illustration", but he also wanted ppgen to generate "[Footnote 1 : ...]" with a space before the ":".

I'm not sure I agree with doing that, but it bothered me a bit that if he is going to do it he'll need to manually edit the text output files after ppgen creates them. With that after-ppgen step there's always the possibility that the PPer will neglect to do it before finally submitting the book.

So, for fun, I implemented a .sr directive.

The .sr directie gathers search/replace regular expressions that

will be applied during the postprocessing phase of ppgen to make changes to the generated output.

Syntax: .sr <which> /search/replace/

Arguments:
  which is a string containing some combination of
      ulth (UTF-8, Latin-1, Text, HTML)
  search is  reg-ex search string
  replace is a reg-ex replace string
  / is any character not contained within either search or replace

The s/r strings are gathered during preprocessCommon and saved for use
during post-processing. Messages are issued telling how many lines

contained each search string, and how many times the replacement was applied.

Example:
  .sr t /(Footnote \d+):/\1 :/
     This will apply to UTF-8 and Latin-1 text generation, and will
     replace the string Footnote <number>:
                   with Footnote <number> :

Note: The user must understand Python reg-ex syntax (e.g., \1 not

$1). , but the strings are treated as raw strings (no Python-specific escapes are needed.

The -dd command line option will supply some additional debugging
information if needed.

I'm not sure what uses it might have other than that one. But it should handle any legitimate search and replace strings that re.search and re.sub(n) understand. One needs to be careful using this, of course, especially against the HTML output files.

I have not given you a pull request for this yet, as it's something totally unexpected that I wasn't sure you'd want to integrate. But it was an interesting learning exercise, even if you don't integrate it into the main branch of ppgen, and it has given me some ideas for another tool.

If interested, you can view it at https://github.com/wf49670/ppgen/tree/PostGenSR

Regards, (and thanks, again, for ppgen!) Walt

windymilla commented 10 years ago

I'm very disappointed it doesn't have the \C...\E feature of Guiguts regexps, in order to allow execution of arbitrary python code. :)

Seriously, it looks interesting - another of those advanced features which means ppgen could do almost anything, but that new PPers would not need to know about.

Splitting the ppgen documentation into beginner/advanced would be good for newcomers. Several of the existing sections, and several parts of some commands could be moved into the "advanced" area. Just looking down the sections, the following would be candidates, IMHO: Centering text, Conditionals, Comments (form 2), Cover image, Division, Drop caps, Emdashes(?), Greek, some of Illustrations, Macros, Mapping, Named Registers, Temporary Indent.

While looking through I also saw the .dt command under Special Situations. I think it is in fact required for every book. The comment about only for PPing convenience isn't true at DP, I don't think.

Nigel

wf49670 commented 10 years ago

Thanks for the comments, Nigel.

I will look into adding \C...\E and the other extensions; I may know a way to do it, but it will take some experimentation. I'm not sure how much it should really be needed, though. That canonical example in the GG manual seems to be increasing or decreasing page numbers by some amount in the HTML, but with ppgen wouldn't it be better to use regexes in the editor you're using to maintain the program and change the .pn statements appropriately, then regenerate?

Being run as they are, they affect what ppgen has generated, not the source file, and most such things are probably better changed in the source, aren't they?

Discussion probably belongs elsewhere, not in the long-closed issue. It only accidentally ended up here when I misdirected an email. If Roger is interested in this at all I suppose we could discuss it in the team thread. That's also a good spot to discuss any revamping of the doc. I'm not sure whether it's better to change the reference manual or do some more simple tutorials. Tutorials, along with better workflow and auxiliary tool suggestions may be more appropriate.

(But one tool that I'm starting to work on is something to apply the GG scanno regexes.)

ghost commented 10 years ago

Walt, I think Nigel was joking about \C...\E. I hope so, at least. More to the point, your .sr mod looks interesting but in your opinion do we need it? I think you see it for fixing one specific situation for the wording used in a LOTE footnote. Would that be more naturally served by a named register? Or just editing the file after generation. Ppgen is not meant to be a master format, after all. I'm struggling to justify adding a general-purpose regex at runtime based on what I know now. It's quite different from GG regexs because they change the source permanently, as in any editor. It's a different thing. Can you tell me more why we need it? I want to support what you do for ppgen, but I know complexity, even if completely optional, is scary to many. I guess what I am saying is that if you want the .sr dot command in there and are willing to document it, then I'll put it in. I don't have a strong reason not to. Just a fear of creeping featurism. You've paid your dues: you get to make this call. Let me know.

Related: Nigel suggests a user-friendly manual of the most common ppgen directives and constructions. I get the feeling that what we have is pretty overwhelming.

windymilla commented 10 years ago

I'm very sorry Walt. I was joking about \C...\E. I should know better than to post a joking message without flagging it much more explicitly - sorry for the confusion.

I think I agree with Roger. We could keep the code changes safely tucked away somewhere, in case other situations that need it arise, but I think it does add an additional air of technical complexity to ppgen. Anything we can do to open ppgen up to less confident PPers is worth doing.

On documentation, I wondered about transclusion (though I'm not really sure if/how it works) as a way of having two manuals based on one set of documentation. Each command on a separate wiki page (some commands split into basic/advanced features on separate pages). The basic manual just transcludes basic commands. The full manual transcludes all.

wf49670 commented 10 years ago

On 10/20/2014 11:04 PM, rfrank wrote:

Walt, I think Nigel was joking about \C...\E. I hope so, at least.

Ah, yes, I seem to have missed the smiley and irony. Nonetheless, while avoiding the ability to execute arbitrary code, it's an interesting challenge that I'll still probably take on :)

More to the point, your .sr mod looks interesting but in your opinion do we need it? I think you see it for fixing one specific situation for the wording used in a LOTE footnote. Would that be more naturally served by a named register? Or just editing the file after generation. Ppgen is not meant to be a master format, after all. I'm struggling to justify adding a general-purpose regex at runtime based on what I know now. It's quite different from GG regexs because they change the source permanently, as in any editor. It's a different thing. Can you tell me more why we need it? I want to support what you do for ppgen, but I know complexity, even if completely optional, is scary to many. I guess what I am saying is that if you want the .sr dot command in there and are willing to document it, then I'll put it in. I don't have a strong reason not to. Just a fear of creeping featurism. You've paid your dues: you get to make this call. Let me know.

I'm not convinced it's needed. (I'm also not convinced that a separate .nr is needed for this.)

I guess I do tend to think of ppgen as providing a master format, and I suspect that others might also view it that way. But at this point we don't have a good use case showing this is needed, so I'm fine with treating it as an educational exercise and leaving it one the shelf until a need is found. It's separate enough that integrating it later, if it turns out to be needed, should be simple to do.

Related: Nigel suggests a user-friendly manual of the most common ppgen directives and constructions. I get the feeling that what we have is pretty overwhelming.

Yes, he could be right about that. And transclusion may be a reasonable approach to having two manuals. The difficulty might be figuring out what is truly basic and what is advanced.

Walt

wf49670 / ppgen

In some situations .pn directives generate invalid <span> elements #21