Closed stejacob closed 7 years ago
You can do that with something like:
html=require"htmlparser"
t=html.parse("<p>This is <strong>a typical</strong> line of <em>text</em></p>")
textonly=t:gettext():gsub("<[^>]*>","")
print(textonly)
Although, there were requests to implement that functionality as library function, and I still not sure if we should.
Thanks for your answer.
It would be great if we could loop through each xml element and filter its node type like in JQuery. Your solution does works for me though. But if you do get that question often, it might be useful to provide a simple function in your library. Thanks for the great work.
Example in JQuery: var textList = root.contents().filter(function() { return this.nodeType == 3; });
Regards.
By using your library, is is possible to extract only the text elements from an HTML document?
For example: <p>This is <strong>a typical</strong> line of <em>text</em></p>
The result would be: This is a typical line of text
I was able to create a recursive function to loop through each elements, but not sure where to go from here to extract text elements only.
Thank you.