isaaclyman / novel-word-count-obsidian

Obsidian plugin. Displays a word count or other statistic for each file, folder and vault in the File Explorer pane.
https://obsidian.md/plugins?id=novel-word-count
MIT License
77 stars 7 forks source link

Option to only count words for rendered/reading view #91

Closed Zeoic closed 2 weeks ago

Zeoic commented 2 weeks ago

Problem

My current problem is that while writing a novel, I use some rather large stylized HTML tables. While the actual visible text in reading view of the table is roughly 50 words, all word count plugins seem to count the HTML itself, which makes my word count over 500.

Idea

Add an option to only count the visible words in the document while in reading view.

Example

image vs image

Shows as 32 words instead of 2: image

isaaclyman commented 2 weeks ago

Much easier said than done. HTML famously can't be parsed out with regex, and implementing a full HTML parser in a word count plugin doesn't seem reasonable. Markdown tables should be handled just fine; there's probably a plugin that can handle the styling for you in a separate stylesheet if you need an accurate word count.

Zeoic commented 2 weeks ago

ah, darn. Too bad there is no way to just get the reading view from obsidian instead of the writing view.

I unfortuantely need the actual HTML tables for my use case, so I can't just use markdown tables with external styling. I'll have to see about some external tool to count words instead I guess. Thanks for the response!

isaaclyman commented 2 weeks ago

Yeah, it's not the first time someone's wanted to exclude content that doesn't appear in reading view (like comments and URLs), and unfortunately I've had to parse them on a case by case basis. Maybe someday Obsidian will expose a "rendered plaintext" API, that would make things easier.

Zeoic commented 1 week ago

Sorry to reply to this closed issue again, just wanted to report that I managed to figure out some regex that strips out the tags I use then does a rough word count with templater. Its a little janky, but it works lol.

This gave me a thought however. I wonder if it would be possible to have an off by default custom regex first pass feature. I imagine that would basically double the lag, atleast, that the plugin causes however. Would be neat from a tinkerer's point of view, but understandable for not wanting to put effort into something so niche like that, which is why I didn't make a new feature request. Just an idea!

isaaclyman commented 1 week ago

No apology necessary. I'll have to give this some thought. Regex is an insufficient tool for stripping out HTML, but if there's a regex input, that would mean it's on the user to determine what is sufficient and what's not...but it also potentially creates patterns of use where people are exchanging regex patterns to enter for various purposes, and I don't want to create a workaround culture that displaces meaningful, performant, and convenient features.

There's also the case where a user accidentally enters, say, a space or period in the regex input and then can't figure out why nothing is being counted. It's a lot of power to give users with a wide range of technical ability levels.

Like I said though, I'll think it over.

Zeoic commented 1 week ago

I agree that regex isn't suitable for most HTML. In my case I was able to only filter the specific tags I use and I just need to keep in mind not to use \<table > or what have you in my story. Defintely not a one size fits all regex string.

Not wanting to foster workaround culture is a very good point, never thought of that aspect. The option being hidden at the bottom of advanced with red text saying it is not reccomended still wont stop some people from breaking things and complaining lol. I guess it could also help point to ideas for baked in improvement when some regex strings get popular enough.