Extracting the word "outside" in <html>inside outside</html>

First off, it's amazing and whoever made this is amazing too

Now, I encountered a small problem. I apologize because I am not knowledgeable with web technologies and my vocabulary is probably incorrect so I hope you'll understand what I mean

If you try to parse this:

<div class='post entry-content '>
<!-- google_ad_section_start -->
 <span style='font-size: 48px;'><span style='font-family: courier new,courier,monospace'>Krist</span></span><br/>
<br/>
Krist is a currency that operates across servers (and in singleplayer). The installer is on the bottom of this post.<br/>
<br/>
Users can send KST to eachother via Krist Addresses, a ten character string that is led by a lowercase <em class='bbc'>k</em>. This is an example of a Krist Address: kg5dc1lzo0<br/>
<br/>
To put KST into circulation, it has to be mined. This involves lots of work done by computers, and means that I can&#39;t just &quot;spawn in&quot; as much KST as I want. I have to mine it like everyone else.<br/>
Initially, KST was mined by in-game computers, but now requires external software.<br/>
<br/>
<strong class='bbc'>Wallet installer:</strong> <pre class='prettyprint'>pastebin run Yv0fChz5</pre>
<br/>
Please post your questions, feedback, insight and Krist Addresses&#33; <strong class='bbc'>There is documentation for every node API call in my profile.</strong>
<!-- google_ad_section_end -->
<br/>
<p class='edit'>
<strong>Edited by 3d6, 14 February 2016 - 04:46 PM.</strong>
</p>
</div>

You won't be able to extract most of the content.

If you take this line banana, you will be able to extract the content (banana) with node:gettext() but you won't be able to extract anything that isn't inside a tag at all (I believe we call that a text node ?)

For example, in this html code: <html>inside outside</html>

You'll be able to extract the word inside as it's inside the tag so it's going to be in a node, but not the word outside despite the fact that both of these words will be displayed on modern browsers and thus, are both important.

I believe that "outside" should also be put in a node, just a node with an empty "name" field. Or call it a text node maybe.

Maybe I missed something, but I couldn't find how to extract most of the content in the html further above.

This makes it difficult when trying to convert an html document to plain text, as all web browsers actually DO display these.

msva / lua-htmlparser

Extracting the word "outside" in <html><b>inside</b> outside</html> #44