wf49670 / ppgen

Post-processing generator for DP
6 stars 4 forks source link

Incorrect HTML generated if <lang> tag spans paragraphs. #19

Closed wf49670 closed 10 years ago

wf49670 commented 10 years ago

Given the following input:

<lang=en>Here is a paragraph.

And here is another one.</lang>

we get the following HTML from ppgen:

<p class='c002'><span lang="en" xml:lang="en">Here is a paragraph.</p>

<p class='c002'>And here is another one.</span></p>

The <span> should be closed before the end of the first paragraph, and reopened/closed inside the next one.

(Or, alternatively, ppgen should issue an error message. But it is more useful to allow the spanning of paragraphs, I believe. There might be additional complexities, though, such as spanning across a table, that could be harder to handle.)

-- Walt

ghost commented 10 years ago

This is a bug. I don't want an error message. It needs to work. I am assigning it to myself.

wf49670 commented 10 years ago

Additional comment: It applies to other tags, too. For example, <i> is allowed to span paragraphs, and generates incorrectly nested HTML tagging, too. The only place ppgen handles this right now is within .nf blocks, I think.

ghost commented 10 years ago

I'm not sure it is a bug now that I look at it closer. I never coded it with that construction in mind. I expected two consecutive paragraphs that were both italic to each have the starting and ending . Thinking about the text version, wouldn't each paragraph have the opening and closing "_" to mark it as italic?

wf49670 commented 10 years ago

That's a good question, Roger. I suspect that it would confuse the reader less if the _ markup were related to the specific paragraphs.

And thinking further, our DP markup rules require that markup cannot span paragraphs. So having tags spanning paragraphs breaks that rule. So that's a bad example, as ppgen should never see it.

I suppose we could say the same for the <lang> case, too. The PPer will simply have to use multiple tags, one per paragraph, if he has a long block of text in a different language. It's slightly different, as it's not tagging coming from the rounds, but inserted by the PPer. And as such, if we can't support it at least a warning might be nicer than generating invalid HTML.

ghost commented 10 years ago

On Oct 8, 2014, at 9:50 AM, wf49670 notifications@github.com wrote:

That's a good question, Roger. I suspect that it would confuse the reader less if the _ markup were related to the specific paragraphs.

And thinking further, our DP markup rules require that markup cannot span paragraphs. So having tags spanning paragraphs breaks that rule. So that's a bad example, as ppgen should never see it.

I suppose we could say the same for the case, too. The PPer will simply have to use multiple tags, one per paragraph, if he has a long block of text in a different language. It's slightly different, as it's not tagging coming from the rounds, but inserted by the PPer. And as such, if we can't support it at least a warning might be nicer than generating invalid HTML.

— Reply to this email directly or view it on GitHub.

(Trying “reply directly” to see if it goes into the issue thread.)

I think it should be a warning and consistent with markup. The tag should not be expected to span paragraphs. Do you want me to code the warning or do you want to do it? Let me know.

wf49670 commented 10 years ago

I'll try coding it.

Thanks for trying the "reply" method; I had wondered about that, too.

Pronovost commented 10 years ago

I can live with the markup not spanning more than one para. It just asks for more cut and paste.

ghost commented 10 years ago

Just an update. <i>...</i> is not allowed to span paragraphs so we are being consistent with that by not allowing <lang>..</lang> to span paragraphs. However, if a user tries that we want to issue a warning as a courtesy. Walt has volunteered to code this warning and provide a pull request.

ghost commented 10 years ago

The more I think about this, the less convinced I am that we need to warn the user if they use a <lang> ... </lang> that spans paragraphs. It would get caught later in the process by not validating. If we have defined the use of our <lang> tag in the manual such that they use it properly, seems that's enough. It becomes an question of how much do we want ppgen to try to catch? It would be easy to start migrating more and more checks into ppgen and that's not what it is meant for. The only checks it really needs to have are those that would not be caught later but the tools that the post-processors are supposed to use.

I'll leave this open in case you (Walt) still intend to put in a warning for this situation. But an acceptable alternative is to just document how to use it and let the PPer's complete process, not just ppgen, handle incorrect usage.

wf49670 commented 10 years ago

I think that leaving it for validation to catch is a valid approach, Roger.

My only concern would be that someone might have a significant amount of fixup to do in a long book by the time they get around to validating it. But it's not much more than the work they would have had to do in the first place to do it correctly. So as long as the documentation is clear I'm happy with it.

I will check the documentation.

Regarding the idea that ppgen should provide a way to denote a longer section of text as being in language "x", that can be handled easily with literal code and a macro, for most cases.

Actually, it should be possible to handle it, but there are several bugs preventing it. I'll work up a fix, unless you'd rather do it. To summarize, the problems are in code around line 2380, and affect HTML processing: (1) Any use of lang= in the actual text of a book throws a fatal error. (2) Any use of LANG in the actual text of a book becomes lang instead. (3) Legitimate uses of lang= within literal (.li) code throw fatal errors.

Walt

ghost commented 10 years ago

I would be okay leaving (1) and (2) as extemely improbable. I would handle them as exceptions instead of adding code just because at least for me there are much higher coding priorities (including some having nothing to do with ppgen or even DP). That third one, yeah, I can see where that could bite us. If you could code at least that it would be great; low priority, though. I'll close this issue and trust you will review and revise, if necessary, the applicable documentation. Thanks.