Blank line rendering in text templates

KosmicTask commented 11 years ago

Consider a text template:

{{% CONTENT_TYPE: TEXT }}
{{ statement-1 }}
{{ statement-2 }}
{{ statement-3 }}
{{ statement-4 }}
{{ statement-5 }}

With data:

{
     statement-1: "Did she leave?",
     statement-3: "Yes.",
     statement-5: "So it's time then to start up the interfabric particle degenerator."
}

This generates:

Did she leave?

Yes.

So it's time then to start up the interfabric particle degenerator.

However we desire to consume lines that render as NULL to produce:

Did she leave?
Yes.
So it's time then to start up the interfabric particle degenerator.

My thoughts were that a line render pragma could be defined:

<<% LINE_RENDER : CONSUME_BLANK >>

This means: if a processed line (i.e.: a line containing a Mustache declaration) evaluates to blank (as defined by the standard library) then consume the line (i.e.: do not output the empty content or end of line char).

groue commented 11 years ago

Thanks Jonathan. This will feed my thoughts about #45.

KosmicTask commented 11 years ago

Perhaps LINE_RENDER is too generic a pragma. Perhaps LINE_OUTPUT with values of OUTPUT_ALL | CONSUME_BLANK.

groue commented 11 years ago

I'm planning to process lines with a "margin" concept. Each tag is surrounded by margins made of white space and new lines (at least one new line). One margin before, one margin after each tag:

{{A}}
{{B}}

{{C}}

{{D}}

A has no margin before, and no margin after. B has a no margin before, and a one-line margin after. C has a one-line margin before, and a two-lines margin after. D has a two-lines margin before, and no margin after.

When a tag does not render, its margins do not render as well.

When two margins are rendered one after the other, they get coalesced, and the longest get rendered.

If the key C would be missing, the rendering would be:

A
B

D

Because D wants two lines before itself, and B only one after itself.

If A and D were missing, we would have:

B

C

Because B has no margin before, and C two lines after.

I'll try that and see where it leads me.

KosmicTask commented 11 years ago

Sounds like a plan. Tabs would also likely need to be included in the whitespace list.

However, this sounds like it might get complex. How does a tag know about its margins? How does B know where its bottom margin ends and C's top margin begins.

groue commented 11 years ago

If I can not express it in clear English, and if the rules do not easily enter, and stick in the user's mind, it will be a failure. Moreover, since nobody reads manual, any solution has rather avoid any surprising behavior.

I was about to have the parser look for blank lines (made of 0+ white space and tabs), and tell tags: guy, you have N blank lines before you, and M after.

I was about to be conservative: margins would be rendered as in the template (preserving the white spaces characters of a blank line). However, when two margins are coalesced, the one with the more lines would be chosen. When coalescing two margins with the same number of lines, one would win.

This is still rather experimental. I have no running code yet, so I can't yet check it against real-life scenarios. Testing against HTML templates, C templates, Python templates, and markdown templates will give me a wide panel of actual languages with various positions against white-space. I hope margins are a good approximation of how one human being brings "air", "tao" into his production :-)

There's a high risk of being overly smartish here, I'm aware of that :-)

KosmicTask commented 11 years ago

If it feels good to go with this idea then I would. Intuition isn't an infallible guide but it is a worthwhile one.

KosmicTask commented 11 years ago

Is there any merit in a tag to enable explicit blank line consumption:

{{< my-header }}

This says: if I render as a blank line consume me (the < implies pulling the line out of the document flow).

groue commented 11 years ago

{{< ... }} is already taken for "overridable partials" aka "layout" aka "inheritable templates". I'm (slowly) coming up with something. It'll take a few days.

groue commented 11 years ago

@mugginsoft This issue and #45 are much harder than I expected. The difficulty lies in the fact that all the layout meta-data about the template (white spaces, blank lines, etc.) gets lost during the rendering. Rendering is very deeply raw-string-based, and even lets the end-user hook into the rendering, and inject his own raw strings (without any meta-data of course - see Rendering Objects). I'm not sure how I can escape this situation.

KosmicTask commented 11 years ago

I haven't looked at the render code in great detail but seems to be that when rendering a value the render code has no knowledge of the values location with a document object model (similar to say the common HTML DOM found in a browser). Is this the case?

If a DOM were accessible then render code could:

query it for the location of a value
extract the necessary meta data
render as required
update the DOM

If what you are saying is that at present GRMustache basically works on a raw string replacement model then it may be too much of a stretch to implement this feature without a DOM.

Thanks for all your effort. I know how time consuming these requests can be.

groue commented 11 years ago

I haven't looked at the render code in great detail but seems to be that when rendering a value the render code has no knowledge of the values location with a document object model (similar to say the common HTML DOM found in a browser). Is this the case?

Yes, GRMustache has a DOM, and no, DOM elements have no context when they render.

For instance, the following template:

{{ statement-1 }}
{{ statement-2 }}
{{ statement-3 }}

...gets translated into a DOM (AST, in the library internal vocabulary) that I could describe in s-expression as:

(template
    (variable "statement-1")    ; A
    (text "\n")                 ; B
    (variable "statement-2")    ; C
    (text "\n")                 ; D
    (variable "statement-3")    ; E
    (text "\n"))                ; F

A template rendering is made of the concatenation of the rendering of each inner elements. Each element gets rendered "absolutely", not knowing anything about its environment.

Particularly, when the element C (statement-2) is empty, we wish that either the B, or D, newline would not render.

This can be resolved by having elements render not in a blind NSMutableString, but in a dedicated object called a "mustache buffer", that would keep track of the nature of rendered strings, and swallow some. For instance, the buffer notices that a line has started with C, and can swallow D, since the line got no content. Nothing that a state machine could not handle here.

So, in a way, this very issue was almost solved.

Nastiness came later, when I tried to go further. Writing it down helps me realizing that there lies my error, actually. Let me tell you the full story:

My problems are lying in the Mustache sections:

{{# script }}               {{! line 1 }}
    {{ statement-1 }}       {{! line 2 }}
    {{ statement-2 }}       {{! line 3 }}
{{/ script }}               {{! line 4 }}

Its DOM:

(template
    (section "script"
        (text "\n    ")
        (variable "statement-1")
        (text "\n    ")
        (variable "statement-2")
        (text "\n")))

I was trying to never have the lines 1 and 4 rendered (because they contain no actual data), and the whole four lines not rendered at all if there were no statement.

Because of the GRMustacheRendering protocol, which lets the user provides his custom rendering code for tags, a tag does not fill a buffer. Instead, it returns a string. This allows writing very useful code snippets.

As a consequence, before entering the mustache buffer, the section has been turned into a string, all its inner structure has been lost, and it's now quite hard to provide white-space processing.

Thanks for all your effort. I know how time consuming these requests can be.

Yes :-) Let's try again :-)

groue commented 11 years ago

OK. I have pushed on the white_space branch the state of the work, which solves this issue, and only this one :)

Would you mind checking this release candidate?

I don't know how you embed GRMustache, if you use CocoaPods, the static lib, or if you compile the raw sources. Let me know if you have trouble using this branch.

KosmicTask commented 11 years ago

I will try and take a look this evening. Failing that, I will report back tomorrow. I compile from source and then link the lib.

Thanks for keeping going!

KosmicTask commented 11 years ago

I have updated my local copy of the GRMustachio toy to use the white_space branch. So:

Data:

{ "item1" : "Alice" , "item2" : "Bob ", "item3" : "Clarisse" }

Template:

{{ item0 }}
We know that {{ item2 }} loves {{ item1 }}
{{ item4 }}
It is presumed that {{ item1 }} loathes {{ item2 }}
{{ item5 }}
Later that afternoon {{ item3 }} declares her affections for them both.
{{ item6 }}

This renders as desired :

We know that Bob  loves Alice
It is presumed that Alice loathes Bob 
Later that afternoon Clarisse declares her affections for them both.

However note the two scenarios below.

Modified data 1:

{{ item0 }}
We know that {{ item2 }} loves {{ item1 }}
                     {{ item4 }}
It is presumed that {{ item1 }} loathes {{ item2 }}
                            {{ item5 }}
Later that afternoon {{ item3 }} declares her affections for them both.
{{ item6 }}

The preceding whitespace (tabs etc) on item4 and 5 is retained:

We know that Bob  loves Alice
                       It is presumed that Alice loathes Bob  
                               Later that afternoon Clarisse

Modified data 2:

{{ item0 }}
We know that {{ item2 }} loves {{ item1 }}.
{{ item4 }}
It is presumed that {{ item1 }} loathes {{ item2 }} (after that business in Helsinki).
{{ item5 }}
Later that afternoon {{ item3 }} declares her affections for them both.

Lines that end with text above disrupt the whitespace eating:

We know that Bob  loves Alice.

It is presumed that Alice loathes Bob  (after that business in Helsinki).

Later that afternoon Clarisse declares her affections for them both.

Will this whitespace behaviour become the default for GRMustache for both HTML and TEXT content types (it is a change from current behaviour)? Do the changes made to implement this behaviour have any bearing on #45?

So, looking positive.

groue commented 11 years ago

All right. Thank you for doing the sanity tests I was too tired to do. OK, we need some tuning and adjustment.

Will this whitespace behaviour become the default for GRMustache for both HTML and TEXT content types (it is a change from current behaviour)?

I'd like to ship as v6.5, not v7: it'll be a new configuration, defaulting on current behavior (no white-space processing). And the this new property will be independent of the content type.

Do the changes made to implement this behaviour have any bearing on #45?

Sure they do :-) Removing blank lines requires isolating blank prefixes and suffixes. These blank prefixes will happily turn into indentation levels for #45.

KosmicTask commented 11 years ago

All right. Thank you for doing the sanity tests I was too tired to do. OK, we need some tuning and adjustment.

A pleasure. I don't think there is anything there that you wouldn't have soon found yourself. And I comprehend that burnt out, too tired to think any more state!

Sure they do :-) Removing blank lines requires isolating blank prefixes and suffixes. These blank prefixes will happily turn into indentation levels for #45.

That's what I was hoping, that generic whitespace knowledge would assist with #45.

Great!

groue commented 11 years ago

I've just pushed tuning, based on your input, to the white_space branch. Let me know how it feels.

If it's OK, I'd like to keep on the cooperation with you, and start working on #45: the release candidate would get closer and closer to your needs, until all your acceptance tests pass.

I would then proceed to the "big picture" tests, and eventually ship 6.5, without fear that your code would break.

Would you be OK with this?

KosmicTask commented 11 years ago

I've just pushed tuning, based on your input, to the white_space branch. Let me know how it feels.

Updated the GRMustachio toy and the previous naive Alice, Bob, Clarisse examples renders as desired. However, consider:

data:

{ "item1" : "Alice", "item2" : "Bob", "item3" : "Clarisse"}

template 1:

{{ item1 }}
{{ item2 }} {{ item3 }}
    {{ item4 }}{{ item5 }}
{{ item4 }}{{ item3 }}
{{ item1 }}

Renders correctly I would say as :

Alice
Bob Clarisse
Clarisse
Alice

However template 2:

{{ item1 }}
{{ item2 }} {{ item3 }}
    {{ item4 }}    {{ item5 }}
{{ item4 }}{{ item3 }}
{{ item1 }}

Renders as :

Alice
Bob Clarisse

Clarisse
Alice

Is this the desired outcome? In this implementation a single key on a line will render if its value is a single space string but not if it is the empty string, which makes sense to me. In template 2 I would say that {{ item4 }} {{ item5 }} should be eaten. This would enable us to make the statement:

Any line containing only keys and whitespace will be consumed unless one of the key values is non NULL.

However, it may be possible to make arguments in favour of the currently exhibited rendering.

If it's OK, I'd like to keep on the cooperation with you, and start working on #45: the release candidate would get closer and closer to your needs, until all your acceptance tests pass.

Fine with me. I deeply appreciate your interest in what is a tricky issue. Not glamorous, but it should be of real value if the ultimate goal is to position GRMustache as a truly generic document templating solution.

I would then proceed to the "big picture" tests, and eventually ship 6.5, without fear that your code would break. Would you be OK with this?

Absolutely. I will keep assisting with this issue and #45 as long as you have the will to pursue it.

Moving forwards!

groue commented 11 years ago

Thanks for your support, Jonathan.

Regarding the line containing several tags separated by white spaces:

Initially my intent was to strip lines such as \s*{{tag}}\s*, but to keep \s*{{tag}}.*{{tag}}\s*. My goal was to take some concepts from the Mustache spec, which strips lines only when they contain a single Mustache tag.

However my implementation has a bug, since it does not notice tags that are immediately adjacent. And it can not, because the white-space processing is done incrementally by the "mustache buffer" where DOM elements dump their content into, one after the other. Buffer gets "blank white space" then "blank end of line", so it strips the line.

All right. I really need to process the DOM as a whole, and look for special patterns inside.

Next iteration will strip blank lines containing a single tag that does not render, and only them.

KosmicTask commented 11 years ago

Thanks for your support, Jonathan.

It's a pleasure Gwendal.

Initially my intent was to strip lines such as \s*{{tag}}\s*, but to keep \s*{{tag}}.*{{tag}}\s*. My goal was to take some concepts from the Mustache spec, which strips lines only when they contain a single Mustache tag.

It makes sense to follow the spec if it provides some guidance. However, only stripping \s*{{tag}}\s* appears somewhat limiting as it prevents the building of more complex expressions that can be stripped (it means that refactoring a single tag expression in an existing template into a multiple tag expression will result in white space behaviour change). However, my own usage case at the present is covered by \s*{{tag}}\s* so my point is generic.

All right. I really need to process the DOM as a whole, and look for special patterns inside.

GRMustache has APIs for template configuration, tag rendering and filters (and likely more). Any moves, even if tentative, towards a query-able (and perhaps ultimately mutable and public) DOM API can only be beneficial, in my opinion (it could lead to solutions to the likes of #47).

Next iteration will strip blank lines containing a single tag that does not render, and only them.

For my usage case this works fine.

Thanks for your perseverance.

groue commented 11 years ago

However, only stripping \s*{{tag}}\s* appears somewhat limiting as it prevents the building of more complex expressions that can be stripped (it means that refactoring a single tag expression in an existing template into a multiple tag expression will result in white space behaviour change).

Sensible. Plus the rule is easier to remember.

Something like:

@interface GRMustacheConfiguration
/**
 * When this option is set, GRMustache strips lines that are made of white space
 * and tags that render empty strings. This does not apply to blank lines that
 * do not contain any tag.
 *
 * With this option, the following lines would not render, assuming `empty`
 * resolves to the empty string:
 *
 *     {{empty}}
 *     {{empty}} {{empty}}
 *               {{empty}}
 *
 * However, the following lines would render, including the blank one in
 * the middle:
 *
 *     Some content and {{empty}}.
 *     
 *     {{empty}} {{nonEmpty}}
 *
 * The default value of this propery is NO.
 */
@property BOOL stripsBlankLines;
@end

groue commented 11 years ago

All right. Current white_space branch has the future stripsBlankLines configuration set by default.

KosmicTask commented 11 years ago

The logic driving the proposed stripsBlankLines looks good to me. Seems simple and consistent.

The current white_space branch seems to behave as well as I currently would expect it to. More complex expressions such as {{# empty }}{{.}}{{^}}{{/}} (is stripped) and {{# empty }} {{.}} {{^}} {{/}} (is not stripped) seem to be handled in the same manner as single value keys.

So no complaints from me, for once. I would go with this functionality.

groue commented 11 years ago

:-) On the way to #45, then!

KosmicTask commented 11 years ago

Great.

samdeane commented 11 years ago

What's the state of play with this branch? It seems to be somewhat behind the master, and I'm wondering if it would be safe to pull the latest master changes into it, as I could do with a solution to the blank lines problem.

For what it's worth, the solution I was considering before I came across this branch was effectively to pre-process the templates to strip out extraneous whitespace, and the rule I was considering was to just remove any line that only contained whitespace plus a single template tag.

The pre-processing seemed like a way of keeping this conceptually clean and out of the main engine.

The just-strip-blank-lines idea seemed to be enough to me, but possibly I haven't thought it through enough :)

groue commented 11 years ago

Hi Sam, you are everywhere :-)

Merging master on this branch is likely to conflict a lot. And actually I'm not happy at all with the job done here. I was nearly programming in a random fashion, without any clear direction in my mind. I just wanted to help @mugginsoft, but I never could do it in a way that would deserve merging into master. The only benefit of this branch is the refactoring of the parser, which is much cleaner now, and has been back-ported in the master branch (it is now a simple state machine).

For what it's worth, the solution I was considering before I came across this branch was effectively to pre-process the templates to strip out extraneous whitespace, and the rule I was considering was to just remove any line that only contained whitespace plus a single template tag.

It looks that this technique could be applied outside of the rendering engine.

The pre-processing seemed like a way of keeping this conceptually clean and out of the main engine.

"Conceptually clean" and Mustache are two words that do not fit well together, especially on the whitespace topic. The Mustache spec has something to say about whitespace: a bunch of ad-hoc rules and behaviors expressed as unit tests. Looking at the code of other implementations, I could never figure out what they are actually doing, except making the spec tests pass at any cost.

The just-strip-blank-lines idea seemed to be enough to me, but possibly I haven't thought it through enough :)

Application-side stripping of whitespace in the sanest idea I've heard on the topic until now.

samdeane commented 11 years ago

For what it's worth, this is the current solution I've ended up with.

After generation of the output, I strip out all blank lines. As you can imagine, this leaves things a bit compact.

I then search for a special token (which could be anything that the template engine itself won't touch - currently I'm using ¶).

If this token appears on an otherwise blank line, I just remove the token, resulting in a blank line. If the token appears anywhere else, I replace it with a newline.

This is probably an imperfect solution for some people, but it's pragmatic, and simple to implement.

It lets me insert as many newlines as I want in the template (which makes them easier to read), knowing that they will all be stripped out. It also lets me mark up places in the template where I really do want a blank line to be emitted.

As long as the special token is reasonably unobtrusive, and won't naturally appear in your template text, this seems to work quite well.

groue commented 11 years ago

Yes, it looks very convenient. This is a nice solution until the white spaces are managed by the library in a less ad-hoc way.

KosmicTask commented 11 years ago

I agree that this solution is practicable. The down side is that you have to have knowledge of the token; this makes the solution slightly less viable for user generated templates.

samdeane commented 11 years ago

Yeah, I think if I was implementing this as an engine feature I'd have an NSString engine property for the special token. If it's not set to anything, the stripping is disabled. If it's set to a string, that gets used as described above. You could presumably also implement a pragma to do the same thing, but you'd have to pick up the pragma in some sort of pre-parsing phase since you need to process the template before using it.

Currently I've just implemented it outside the engine. It's basically a whole 3 lines of code:

                output = [output stringByReplacingOccurrencesOfString:@"\n[ \t\n]*\n" withString:@"\n" options:NSRegularExpressionSearch range:NSMakeRange(0, [output length])];
                output = [output stringByReplacingOccurrencesOfString:@"\n¶\n" withString:@"\n\n"];
                output = [output stringByReplacingOccurrencesOfString:@"¶" withString:@"\n"];

netbe commented 10 years ago

Any update on this? is it planned to be released soon? Or how is it possible to have this behaviour on current release (7.0.2)?

groue commented 10 years ago

Hi @netbe. I'm not currently working on this.

groue / GRMustache

Blank line rendering in text templates #46