paweljanicki / trumbowyg-counter

MIT License
0 stars 0 forks source link

Word count can be inaccurate #3

Open timler opened 2 years ago

timler commented 2 years ago

I've spotted an issue with the word counter, and I'm not sure how to solve it (easily). The issue is that any code to strip HTML ends up concatenating words that are only separated by HTML tags.

For example:

<p>One</p><p>Two</p><p>Three</p>

This should result in a word count of 3, but because the text being returned from the element is OneTwoThree the word count is calculated as 1.

Any code that strips HTML will return the same result.

This is clearly a bug, but how does one fix it when it's so prevalent? Putting a space before each closing paragraph tag seems like a smelly hack.

https://stackoverflow.com/questions/73225259/calculating-word-count-after-stripping-html

timler commented 2 years ago

Thanks to the helpful answer to my question on Stackoverflow, and this thread I have managed to solve the problem by adjusting the code in the plugin slightly:

var text = $(trumbowyg.$ed).prop("innerText"),
                words = (text !== ''? text.match(/\S+/g).length: 0),
                characters = (text !== ''? $(trumbowyg.$ed).text().length: 0),
                output = '';

Apparently in JQuery "text()" is often assumed to be innerText, but clearly it is not.

timler commented 2 years ago

Aaargh. There's never a free lunch. InnerText requires the content to be rendered, which causes a performance issue with larger amounts of text. I have reverted my solution and am going with an inaccurate word count for now