However, it is rendered by the parser as test1test2test3test4
The deeper issue is that whitespace between nodes is not being recorded or indicated in any way.
Solutions
Playing around with this on astexplorer.net shows that most parsers (ie. htmlparser2, parse5, etc) create a TextNode for the whitespace.
What's interesting, however, is that Angular takes a more intelligent route, which is likely faster. Like node-html-parser, it does not create a TextNode for these. Instead, it allows users to determine for themselves via the range information attached to each node.
The range information, offered by most parsers, is simply the specific index for where a node begins and ends. Specifically, these positions are for the first char of the opening tag and the last of the closing tag, respectively.
Proposed solution
I propose simply adding a range array to each node, per convention. In so doing, we are able to determine whether a node has trailing whitespace.
For example:
<!-- The following nodes have contiguous ranges. The ranges are [ 0, 17 ] and [ 17, 35 ], respectively. -->
<!-- When we compare the end of the first node (17) with the start of the next (17), we can see there is no space -->
<span>text1</span><span>text2</span>
<!-- These nodes, however are non-contiguous. The ranges are [ 0, 17 ] and [ 18, 37 ], respectively. -->
<!-- By comparing the end and start locations, we know that there is at least one whitespace char between them -->
<span>text1</span> <span>text2</span>
Issue
Assuming:
In browsers, this is rendered as:
test1 test2 test3 test4
However, it is rendered by the parser as
test1test2test3test4
The deeper issue is that whitespace between nodes is not being recorded or indicated in any way.
Solutions
Playing around with this on astexplorer.net shows that most parsers (ie. htmlparser2, parse5, etc) create a
TextNode
for the whitespace.What's interesting, however, is that
Angular
takes a more intelligent route, which is likely faster. Likenode-html-parser
, it does not create aTextNode
for these. Instead, it allows users to determine for themselves via therange
information attached to each node.The
range
information, offered by most parsers, is simply the specific index for where a node begins and ends. Specifically, these positions are for the first char of the opening tag and the last of the closing tag, respectively.Proposed solution
I propose simply adding a
range
array to each node, per convention. In so doing, we are able to determine whether a node has trailing whitespace.For example:
I am submitting a PR shortly.
Related Issue
https://github.com/crosstype/node-html-markdown/issues/16