Closed z64 closed 2 years ago
tag_text is more like private method, it should be called only on nodes node.textable?
. I think method inner_text preferable to use.
@kostya I found undesireable behavior with inner_text
; the text nodes themselves don't yield anything, instead it gets pushed up to the parent node:
example = <<-HTML
<p><b>Some <a>text 1</a> after 1</b></p>
<p><b>Some <a>text 2</a> after 2</b></p>
HTML
html = Lexbor::Parser.new(example)
html.css("p").each do |node|
node.walk_tree do |inner_node, level|
print(" " * level * 2)
print("#{inner_node.tag_name} -> ".colorize.yellow)
puts(inner_node.inner_text(deep: false))
end
end
p ->
b -> Some after 1 <--- wrong
_text ->
a -> text 1
_text ->
_text ->
p ->
b -> Some after 2 <--- wrong
_text ->
a -> text 2
_text ->
_text ->
Compared to:
html = Lexbor::Parser.new(example)
html.css("p").each do |node|
node.walk_tree do |inner_node, level|
print(" " * level * 2)
print("#{inner_node.tag_name} -> ".colorize.yellow)
if inner_node.textable?
puts(inner_node.tag_text)
else
puts
end
end
end
p ->
b ->
_text -> Some
a ->
_text -> text 1
_text -> after 1
p ->
b ->
_text -> Some
a ->
_text -> text 2
_text -> after 2
which allows me to correctly perform the reconstruction I'm doing in the right order.
Is there another way?
both example are ok for me, tag_text just more low level, if need it use it.
When moving some code from myhtml to lexbor, I came across this change in behaivor:
Output:
It would appear that
tag_text
no longer returns only the current nodes text, but includes all children as well, similar todeep
options in other methods.At a glance, this seems like it could be a missing feature or behavior on lexbor, but I'm not certain. In any case, I figured I would start by opening an issue here for other Crystal users.
The workaround is to explicitly check node type:
and that will mimic the same behavior as myhtml.