wf49670 / ppgen

Post-processing generator for DP
6 stars 4 forks source link

Handle inline .sn commands (inside paragraphs) #78

Closed davem2 closed 9 years ago

davem2 commented 9 years ago

This change allows .sn directives to occur inside of paragraphs (inline).

For text formats inline sidenotes are output in the form [Sidenote: Inline note] inline with the paragraph text (like Charlie's example). If needed in the future, we could add a way to change this behavior so that inline sidenotes are printed on their own line or relocated to the top of the paragraph (as Tom requested).

For HTML formats inline sidenotes follow the example found at http://www.pgdp.net/wiki/Sidenotes.

It was a little tricky finding the best place for sidenote processing to happen. I ended up handline inline sidenotes seperately from "normal" ones to avoid issues with paragraph wrapping. Processing of .nr commands had to the moved to preprocess() routine since inline sidenote processing depends on self.nregs["Sidenote"].

wf49670 commented 9 years ago

I'm not sure what that last comment about @media handheld means, David. Do you mean that it should be: @media handheld {.sidenote, { float: left; clear: none; font-weight: bold; }} rather than what you currently have, which includes .sni? @media handheld {.sidenote, .sni { float: left; clear: none; font-weight: bold; }}

wf49670 commented 9 years ago

I've looked at your code, and while it should work, I don't like the different way of handling the .sn within a paragraph between the text and html versions, nor the different way of handling .nr between the two versions. So I think I'd like to rewrite that. What you did for html is good, and the text could (and should, imo) be handled the same way, with transforming the .sn into "⓫{}⓫".format(m.group(1)) and then transforming it later inside doPara into the [Sidenote: ...]. That also avoids the need to change how .nr is handled.

I do like the idea of keeping the standalone .sn separate as you have it. It means that for text the [Sidenote: ...] is not affected by .in or .ll, which I think is appropriate.

davem2 commented 9 years ago

I'm not sure what that last comment about @media handheld means, David After submitting the pull request I noticed a small mistake in the comment on line 5748

# 1501 @media handheld

should be

# 1501 @media handheld .sidenote
wf49670 commented 9 years ago

Oh, on the comment in the table. Thanks.

davem2 commented 9 years ago

The .nr change bothered me as well, but I was worried that handling text like you're suggesting could cause line wrap issues. I was planning on transforming "⓫{}⓫" in postprocess(), and doPara() happens in process() so depending on the length of "Sidenote" translation, line wrap would be off. Handling it in doPara() should work (though it doesn't feel like that behavior belongs there).

wf49670 commented 9 years ago

On 2/17/2015 11:01 PM, davem2 wrote:

The .nr change bothered me as well, but I was worried that handling text like you're suggesting could cause line wrap issues. I was planning on transforming "⓫{}⓫" in postprocess(), and doPara() happens in process() so depending on the length of "Sidenote" translation, line wrap would be off. Handling it in doPara() should work (though it doesn't feel like that behavior belongs there).

To me, the processing fits much better in doPara(0 than in postprocess(). Expanding the sidenote notation needs to be done before wrapping in text, as you mention. It doesn't belong in pre-process, and it doesn't belong in post-process, so that puts it in process(), and the best spot in process() is in doPara.

I'd like to include both cases (stand-alone and inside-the-paragraph) in doPara(), but though it's simple for text it's more complex for HTML, and I'm not yet sure the complexity is worth it just to get a bit more consistent design.

Walt

wf49670 commented 9 years ago

Have you considered whether we should support sidenotes in tables or in .nf blocks?

I think I see some ways we could do that in the HTML output, but I'm not at all sure how they should look in the text output. I'm also not sure they're truly needed. I think they might be less likely to occur in tables, but it seems reasonable that they would occur within poetry, for example.

Regards, Walt

wf49670 commented 9 years ago

I've done some rework on the sidenote processing: (a) .nr processing happens where it always did before

(b) Standalone sidenote processing happens in doSidenote, as you had it.

(c) In preprocesscommon() a scan is made for sidenotes within a paragraph, and all are encoded as you had the html ones encoded.

(d) In doPara (Text version) the encoded sidenotes are detected and converted to external format, so the wrapping works properly.

(e) In postprocess (HTML version) any encoded sidenotes are detected and changed to spans, as you had it.

Thus, basically, changes from your code are in (a) to revert your change, in (c) to handle embedded (inline) sidenotes for both text and HTML as that seems cleaner, and in (d) to avoid the wrapping problems.

I also made some changes to the code in (c) to better detect some boundary conditions. For example, we do not require blank lines between dot commands, so we should recognize blank line .sn ... .ll 66 as a standalone sidenote. Also, I don't think your code would have properly handled the edge conditions of .sn at the very top or bottom of the file. It would not blow up, but logically if the .sn is the first line then the line above it should be considered blank. The same applies if .sn is the last line: the line after it should be considered blank.

I'm also unsure about one aspect of how this code (either version) is working.

Given: blank line .sn something paragraph text we treat the sidenote as standalone. Would it be more reasonable, perhaps, to treat it as part of the paragraph? The same thing happens with: paragraph text .sn something blank line and again, I wonder if the sidenote should be treated as part of the paragraph in that case.

What do you think?

Thanks for all your work on this, David. It's been very useful.

If you want to look at my version you'll find it here: https://github.com/wf49670/ppgen/archive/Sidenotewf.zip

Regards, Walt

wf49670 commented 9 years ago

I found something we missed in HTML sidenote handling, which I've added to doSidenote (HTML version) in my version of the code at https://github.com/wf49670/ppgen/archive/Sidenotewf.zip

We need to account for pending vertical space when processing a sidenote that is outside of a paragraph. So I added a check similar to, and output similar to, that done in doLit (HTML version).

Regards, Walt

davem2 commented 9 years ago

For example, we do not require blank lines between dot commands, so we should recognize blank line .sn ... .ll 66 as a standalone sidenote.

I used this logic when determining paragraphs in an earlier implementation but ran into some issues. The root of the problem is that some dot commands cause a paragraph break while others do not (.pn, .bn, .if, .li, .ig, .sn, ..others?). Relocating the sidenote processing to a point where .pn and .bn commands have already been processed and converted to @150@ style markup gets around the issue for .pn and .bn (not the best solution). Most (all?) of the others become non issues as well since they are handled very early on in processing.

The case that caused me to change logic was consecutive .sn commands inside of a paragraph:

paragraph .sn inline sidenote 1 .sn inline sidenote 2 paragraph continued

Using the treat dot command as blank line logic would cause a paragraph break where there should not be one. You're going to need to put in a special case for this if you want to use the new logic (or ignore it, I'm not sure this actually happens in the wild). I chose to require the blank line before or after as it gets rid of any ambiguity and seemed cleaner.

Another solution I considered was using markup for inline and .sn for normal.

Given: blank line .sn something paragraph text we treat the sidenote as standalone. Would it be more reasonable, perhaps, to treat it as part of the paragraph? The same thing happens with: paragraph text .sn something blank line and again, I wonder if the sidenote should be treated as part of the paragraph in that case. What do you think?

I'm not sure. It would simplify implementation, though you would have to watch out for "naked" (I remembered to escape this time :smile:).

I tried to match the guidelines here http://www.pgdp.net/wiki/Sidenotes which state:

"For the HTML, if the sidenote is outside a paragraph and should be on its own lines, use a div:" "If the sidenote needs to be mid-paragraph, use a span:"

I'm not sure exactly why this is recommended, but it looks like some amount of testing on a variety of e-reader devices was done and assume that this method gives the best compatibility.

wf49670 commented 9 years ago

You raise some interesting points, David.

For example, we do not require blank lines between dot commands,
so we should recognize blank line .sn ... .ll 66 as a standalone
sidenote.

I used this logic when determining paragraphs in an earlier implementation but ran into some issues. The root of the problem is that some dot commands cause a paragraph break while others do not (.pn, .bn, .if, .li, .ig, .sn, ..others?). Relocating the sidenote processing to a point where .pn and .bn commands have already been processed and converted to @150@ style markup gets around the issue for .pn and .bn (not the best solution). Most (all?) of the others become non issues as well since they are handled very early on in processing.

Good point.

We handle .if and .ig and .bn earlier in preprocesscommon, so they won't be a problem. Unfortunately .pn is handled later, in preprocess, as it's handled differently for text and HTML. So my approach would need to exempt .pn.

.li is also handled later (during process). It does cause a paragraph break, as do all other dot commands today.

The case that caused me to change logic was consecutive .sn commands inside of a paragraph:

paragraph .sn inline sidenote 1 .sn inline sidenote 2 paragraph continued

Using the treat dot command as blank line logic would cause a paragraph break where there should not be one. You're going to need to put in a special case for this if you want to use the new logic (or ignore it, I'm not sure this actually happens in the wild). I chose to require the blank line before or after as it gets rid of any ambiguity and seemed cleaner.

Another good point. I'm tempted, at the moment, to keep it as I rearranged it but to exempt .sn and .pn, but it will need further thought.

Another solution I considered was using <sn></sn> markup for inline and .sn for normal.

Given: blank line .sn something paragraph text we treat the
sidenote as standalone. Would it be more reasonable, perhaps, to
treat it as part of the paragraph? The same thing happens with:
paragraph text .sn something blank line and again, I wonder if the
sidenote should be treated as part of the paragraph in that case.
What do you think?

I'm not sure. It would simplify implementation, though you would have to watch out for "naked" <spans> (I remembered to escape this time).

If we accurately determine that it's part of a paragraph we can't end up with a naked span. The user would provide blank .sn sidenote-text paragraph text starts and we would turn the .sn into an encoded version with ⓫sidenote-text⓫ and then later (in process()) that will trigger doPara which will put a
<p> in front of it and otherwise ignore it. Then still later, in post-process, we'll see the ⓫ and generate the <span> to enclose the sidenote text.

I tried to match the guidelines here http://www.pgdp.net/wiki/Sidenotes which state:

"For the HTML, if the sidenote is outside a paragraph and should be on its own lines, use a div:" "If the sidenote needs to be mid-paragraph, use a span:"

I'm not sure exactly why this is recommended, but it looks like some amount of testing on a variety of e-reader devices was done and assume that this method gives the best compatibility.

Yes, that complex span structure was determined to work on the greatest number of ereaders, including Kindle and those that handle epub.

Thanks, David.

(By the way, Charlie gave me a version of the text he used to develop that wiki article, and he and I did some retroconversion to get it back closer to original text, and I further tweaked it to make it ppgen-compatible, and ran it through. You can find a .zip of it with output files here https://dl.dropboxusercontent.com/u/3229599/ppgt138b-src.zip if you want to take a look.)

Walt

wf49670 commented 9 years ago

How about this version: https://github.com/wf49670/ppgen/archive/Sidenotewf.zip

Init, doSidenote() handles any sidenotes that come before a paragraph starts. In the HTML these are done with divs.

Once a paragraph starts, doPara() (both text and HTML versions) will recognize and allow .sn directives as part of the paragraph text. In the text version of doPara() the .sn is transformed directly into the [Sidenote: ...] so it will wrap properly. In the HTML version of doPara() the .sn is encoded using the ⓫text⓫ style. In the HTML those will be translated to sidenote spans during post-processing.

This approach keeps the processing very similar in doSidenote and doPara. It eliminates the special scan for .sn during preprocessing.

The one oddity: If you have a sequence like blank line or dot directive .sn text beginning of paragraph text .sn text blank line then the .sn before the paragraph text will be standalone, but the .sn after the paragraph text will be part of the paragraph. If the PPer doesn't want that one to be part of the paragraph he can simply put a blank line after the paragraph.

It would be possible to force the one before the paragraph into the paragraph, too.

But that will still leave the situation where given: .sn text1 .sn text2 paragraph text we will make .sn text1 standalone but put .sn text2 into the paragraph.

Thoughts?

Thanks, Walt

wf49670 commented 9 years ago

I still wasn't happy with that inconsistency above, and with how much more work would be needed if we were to decide that we needed to support inline sidenotes in other spots.

So I thought some more about your comment about perhaps using <sn>...</sn> for the inline ones and implemented that in this version: https://github.com/wf49670/ppgen/archive/Sidenotewf2.zip

I am much happier with this one. And it extends naturally to having sidenotes in, for example, chapter headings, footnotes, .nf blocks, and tables. Sidenotes in .nf blocks and tables have some oddities the PPer would need to work out using .de, but possibly I will do a bit more work and allow <sn class='classname'>text

Please have a look, and let me know what you think.

Note: It does not validate that ... actually happens inside other text, so in theory the PPer could have: paragraph text --blank line--

sidenote text

--blank line-- and that would generate a naked <span> for the "inline" sidenote. But I can code around that, as we do for naked pagenumber spans.

wf49670 commented 9 years ago

I have enhanced the <sn> tag with a class="classname" (or class='classname') operand so that individual inline sidenote spans can be tailored more easily via .de.

I also tweaked the basic CSS a bit (wider margin-right, for example, which seems to avoid some cases of overlap better, and set martin-top and -bottom to 0 to get better vertical alignment with the other text.).

The default values may need further tweaking, and PPers will definitly have to tweak things via .de when they use inline sidenotes in settings like inside tables, or within .nf blocks.

But I think it's ready to be turned loose for use by others. Please play with it some and see what you think. I'm also going to ask Nigel to participate in a bit of playing, if he has time.

Thanks, Walt

wf49670 commented 9 years ago

One of my more complex test cases is here: https://dl.dropboxusercontent.com/u/3229599/ppgt138e-src.txt

Another interesting test (not included in that one) is something like: .in 4 .ti -4 Paragraph with a sidenote text in various places, and with various length paragraphs.

davem2 commented 9 years ago

Looks good :). The format works much better for inline markup.

On a related note, I noticed a CSS issue when viewing on iBooks (iPad).

Long sidenotes are displayed on one line instead of wrapping at max-width (both div and span). Modifying the CSS to set width fixed the problem (though I can't be certain the change won't cause a problem on another reader). When I get access to a nook I'm going to see how it looks there.

So, instead of just min-width, max-width

.sidenote, .sni {min-width: 5em; max-width: 5em;}

set width as well

.sidenote, .sni {width: 5em; min-width: 5em; max-width: 5em;}
windymilla commented 9 years ago

I noticed the max-width and min-width thing too. You could probably get rid of them and just use width. Nigel

On 23 February 2015 at 19:32, davem2 notifications@github.com wrote:

Looks good :). The format works much better for inline markup.

On a related note, I noticed a CSS issue when viewing on iBooks (iPad).

Long sidenotes are displayed on one line instead of wrapping at max-width (both div and span). Modifying the CSS to set width fixed the problem (though I can't be certain the change won't cause a problem on another reader). When I get access to a nook I'm going to see how it looks there.

So, instead of just min-width, max-width

.sidenote, .sni {min-width: 5em; max-width: 5em;}

set width as well

.sidenote, .sni {width: 5em; min-width: 5em; max-width: 5em;}

— Reply to this email directly or view it on GitHub https://github.com/wf49670/ppgen/pull/78#issuecomment-75614276.

davem2 commented 9 years ago

I also tweaked the basic CSS a bit (wider margin-right, for example, which seems to avoid some cases of overlap better, and set martin-top and -bottom to 0 to get better vertical alignment with the other text.).

Personally I prefer the original styling. The overlap cases can (and should in my opinion) be solved by adjusting the width to accommodate the longest word. Having the top and bottom margins set to 0 causes consecutive sidenotes to have no padding between them.

wf49670 commented 9 years ago

Sorry, David, but I have to strongly disagree with having the top and bottom margins > 0. To me, the visual mis-alignment of the sidenote from the paragraph text is too jarring. (And in my tests, in a browser at least, consecutive sidenotes do not run together. There is a small gap between them. (Of course, that may be because my browser window is not the same color as we've specified for the background box for the sidenote span/div. So I can clearly see the separation as there's a small, but visible, stripe of a different color between them.)

On the other hand, with .de the PPer can do whatever s/he wants, so it might not matter too much, but it would be good to have a clear set of examples and documentation about the kind of options .de will provide.

wf49670 commented 9 years ago

As David used Charlie Howard's Sidenotes DP Wiki page to develop the CSS using min/maxwidth, I decided to ask Charlie if he had more info. He offered this to add to our discussion (with my emphasis in the next-to-last paragraph):

To refresh my foggy memory, I just tested it both ways, and see why I used min/max:

With Chrome, both min/max and simple width yield identical results: all of the sidenote boxes are the same width, which is what I wanted them to be, because it looks neater and more professional that way (at least, to me). Long SN's use multiple lines, while very short SN's are in boxes with some empty space on the right.

With mobi (Kindle Paperwhite and Fire), the widths of the sidenote boxes vary based on the text, regardless of which way width is specified, which is not what I want, but apparently cannot control. Long SN's use one line with either method.

Epub (both ADE and Nook) behave like a Browser: the widths of the sidenote boxes are variable width when "width" is used, so a short SN appears in a narrow box; but the boxes always are the same width when min/max is used. And multiple lines are used when needed with both methods.

Finally, epub on my iPad with iBooks behaves just like mobi (one line of varying length), regardless of whether min/max or simple fixed width is used. This is different from the results Dave got.

For these tests, I used a local version of epubmaker, NOT the online ebookmaker, because it was much quicker. Perhaps ebookmaker does something different for this css. (When working on current projects, I test with local epubmaker, but use online ebookmaker for the final test and for the SR people).

davem2 commented 9 years ago

Here are the specific versions of software I used to get my results:

Screenshot with only max-width and min-width set (default CSS): img_0016

Screenshot with max-width, min-width and width set (added .de .sidenote, .sni { width: 11em; min-width: 11em; max-width: 11em; }) img_0017

davem2 commented 9 years ago

I came across a bug. In situations like the example below, sidenote text comes out italicized:

"<i>Less than half the families in America are propertyless;
nevertheless, seven-eighths of the families
<sn>HALF THE NATION.</sn>
hold but one-eighth of the national
wealth</i>," and vice versa. "<i>While one
per cent of the families hold more</i>
(wealth) <i>than the remaining ninety-nine</i>," says Dr.
C. B. Spahr.[19]

If is within , or other inline markup it will inherit the parent CSS (italics in this case) unless those styles are explicitly overridden by . So, adding font-style: normal; to the .sidenote, .sni CSS would fix the issue above with italics.

Here's a full list of what needs to be added:

font-style: normal; /* <i> */
font-weight: normal;  /* <b> */
font-variant: normal;  /* <sc> */
letter-spacing: 0em;  /* <g> */
text-decoration: none;  /* <u> */
font-size: small; /* <l>, <xl>, <xxl>, <s>, <xs>, <xxs>, <fs> (currently set to smaller which is dynamic based on parent style) */

And an example demonstrating the fix:

.de .sidenote, .sni { font-style: normal; font-weight: normal; font-variant: normal; letter-spacing: 0em; text-decoration: none; font-size: small; }

<u><xxl><g><sc><b><i>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed doeiusmod 
tempor incididunt ut labore et dolore magna aliqua. Ut enimad minim veniam, 
<sn>SIDENOTE TEXT.</sn>
quis nostrud exercitation ullamco laboris nisi utaliquip ex ea commodo 
consequat. Duis aute irure dolor inreprehenderit in voluptate velit esse 
cillum dolore eu fugiat nullapariatur. Excepteur sint occaecat cupidatat 
non proident, sunt inculpa qui officia deserunt mollit anim id est laborum.
</i></b></sc></g></xxl></u>

<sn>SIDENOTE TEXT</sn>
wf49670 commented 9 years ago

Good catch. I'll update the code after breakfast.

Though it's clear that .sni needs that, it's not obvious to me that the .sidenote needs it. But I suppose it won't hurt :)

wf49670 commented 9 years ago

I've updated branch sidenotewf2 to have that fix.

I also confirmed that it it is still possible after the change to use styling tags within either kind of sidenote.

wf49670 commented 9 years ago

I've rolled sidenotewf2 into Develop and pushed it out to Github. I'll leave the branch around in case we need more experimenting, for awhile.

wf49670 commented 9 years ago

I also added "width" to the CSS in Develop (not in sidenotewf2) though we seem to need more experimentation with that.

windymilla commented 9 years ago

I really think that width should be sufficient. I can't see how max & min width could also be needed, but as you say, it should be checked. Nigel

On 24 February 2015 at 16:04, wf49670 notifications@github.com wrote:

I also added "width" to the CSS in Develop (not in sidenotewf2) though we seem to need more experimentation with that.

— Reply to this email directly or view it on GitHub https://github.com/wf49670/ppgen/pull/78#issuecomment-75784683.