Whitespace issue - Githubissues

ivan-kleshnin commented 8 years ago

I'm not an Elm user but I'm going to build a service, similar to this one, to convert HTML to HyperScript and I'd like to discuss one specific aspect with interested person.

I see that currently html-to-elm just trims all whitespace around text. Hovewer, this is not correct in general case. Trailing and leading space around and inside inline tags is significant. For example, this:

<div>
  <span>1</span>
  <span>2</span>
  <span>3</span>
</div>

currently translates to this:

div [] [ 
  span [] [ text "1" ],
  span [] [ text "2" ],
  span [] [ text "3" ]
]

but should be converted to this:

div [] [ 
  span [] [ text "1" ], text "\n",
  span [] [ text "2" ], text "\n",
  span [] [ text "3" ], text "\n"
]

because inline tags are part of the big imaginary text block inside div. This space acts not as leading or trailing but as space between words! Sorry if I messed up with syntax, as I said I'm not an Elm user.

There are three possible approaches here:

Strip all trailing whitespace (current one)

+ simple to implement

leads to "glued" words in some cases. Must be fixed manually.
Keep all trailing whitespace

+ simple to implement

leads to excessive whitespace in some cases. Must be fixed manually.
Different whitespace handling, depending on tags types (heuristic)

+ correct output in most cases

correctness is not guaranteed, as CSS may redefine which tags are inline and which are block
complex to implement

What do you think about it? Which approach should we prefer and why? Did that whitespace issue occur in your practice?

mbylstra commented 8 years ago

Hi Ivan,

It sounds like you've put a lot of thought into this problem - this is very useful information. I chose to strip all whitespace around tags, as the generated code becomes very verbose (and quite ugly) without it. One of the main purposes of the tool was a learning aid for newcomers to Elm - I'd rather not turn them off with the excessive whitespace text functions.

As the tool is really just a helper, I think it's acceptable to require the user to do some manual clean up where there are whitespace issues - overall it's still a lot easier than converting html by hand. This would be a much bigger problem if the code was somehow being used as a template language (like JSX), in which case the second option would be better.

The third option would be quite difficult to implement, and as you point out, would not guarantee correctness, so it may not be worth the effort.

Thanks for pointing this out - I'll try to fit in some documentation somewhere, so that users don't get surprised when some words become glued together.

ivan-kleshnin commented 8 years ago

@mbylstra thank you for reply! I'm leaning to same solution. 100% agree with your argumentation. Usage pattern of such services (copy-then-modify) allows us to do so.

mbylstra / html-to-elm

Whitespace issue #7

Strip all trailing whitespace (current one)

Keep all trailing whitespace

Different whitespace handling, depending on tags types (heuristic)