Closed UXPublishing closed 10 months ago
Metadata is already excluded from the counts.
There's no such thing as a "markdown comment." What are you referring to specifically?
As for links to other notes, the square brackets around a [[link]]
don't add any extra words.
Ok. Thank you for clarifying.
By comments, I mean in Obsidian when you use %%Comment%% to include text that only appears in Editor mode. I would put writing guidelines inside comments so they would be available during editing but hidden when the final writing is viewed in Preview mode.
Ah, okay. I wasn't familiar with that feature. I'll look into the Obsidian API and see if there's a way to parse that out.
As of v2.20.0, there's a setting in the Advanced section that allows you to exclude comments. Let me know if you have any trouble with it.
Note that while metadata and comments will be excluded from character count, it will still include links. I think most users would want it to. When you're pasting markdown into another program (or even just pasting links into e.g. Twitter), the link will count against your character limit. Unfortunately there's not a way to ask Obsidian for "only text that's visible in Reading mode", you have to parse each thing out individually.
If there are other things that you think shouldn't be included, feel free to create another issue.
Thank you! I just tested the update and it works as you said it would. I have another feature request so I'll create a new issue.
Would be possible to exclude the HTML comments (both inline and block comments) as well?
Obsidian writing goal
plugin implemented excluding comments via regex
https://github.com/lynchjames/obsidian-writing-goals/blob/main/src/IO/obsidian-file.ts#L21:~:text=import%20type%20%7B%20CachedMetadata,47
The problem with obsidian comments is that they aren't recognized by other common markdown editors , and when I want use comments in the document I want to be compiled via pandoc or shared with other I am forced to use HTML type comments <!-- commented text -->
That's a fair ask. Thanks for the code reference, I'll see if I can work that in.
I did a PR to the aforementioned plugin, I think that even simpler regex might work with (%%.*?%%|<!--.*?-->)
expression and gmis
options for both inline and block comments,
@danieltomasz Thinking through a couple things here.
RegExes are known to be slower than other types of string operations. I don't have a several-thousand-note vault to test on, but I want to be thoughtful about people who do, and the added startup cost while the plugin is reanalyzing every note.
What happens if a user uses both types of comments and they overlap? Obviously a terrible idea to you and me, but I do want to think about it so it doesn't get filed as a bug later.
Example:
Day 2 of learning HTML. To make a comment in HTML, you use <!-- these marks -->.
%% This is the only HTML tag that can use an exclamation point as the first character: <!--. That way the browser won't confuse it with other tags. %%
I wonder if --> could show up with Web Components? Is -- even a valid name for a Web Component?
Also, your RegEx (%%.*?%%|<!--.*?-->)
doesn't work for multi-line comments. Example:
<!-- an
HTML block comment
that should not count -->
%% A multiline
comment that should not count %%
A working RegEx would need to include line breaks, i.e. (%%[\s\S]*?%%|<!--[\s\S]*?-->)
.
Okay, with a non-capturing group and the lazy quantifier the performance should be okay. I've done a bunch of testing on (?:%%[\s\S]+?%%|<!--[\s\S]+?-->)
and it works the way you'd expect, even with overlapping comment types. Will go live in the next release.
If you will use specifically with gmis
options the simple regex I posted worked multiline and with overlaping commands,
As I tested it here https://regex101.com/r/rNlSZq/1
and with my fork of word goal plugin
but I am no expert with regex and I didn't reproducibly tested performance among many files and different possible regex, so I appreciate if you made it more efficient
Author of another plugin Better Word Count
uses similar approach to yours https://github.com/lukeleppan/better-word-count/blob/d1f84150df12cef8857218022242f70423d8c1a8/src/constants.ts#L12:~:text=export%20const%20VIEW_TYPE_STATS,13
But big thanks for the update, as it works, no matter which regex expression is using :)
If you are interested in optionally excluding markdown headers and other markdown syntax (curently the below text in header is counted as 2 words)
## Header
, author of obsidian-writing-goals
uses remove-markdown
library,
https://github.com/lynchjames/obsidian-writing-goals/issues/8
but I dont know how this would affect performance for many files in the vault
With the addition of the option to exclude html comments your and @lynchjames plugin gives almost the same estimates now (with the only difference regarding markdown characters) so I am happy now :)
Ah, I missed the s
flag on first pass...didn't even know that flag existed. TIL!
remove-markdown
is abandonware at the moment and consists of a long list of RegEx .replace()
calls. With my reluctance to use one well-optimized RegEx, you can imagine how I feel about tacking on 20 more. Performance is always top-of-mind for me since this plugin scans the entire vault on startup.
Besides, I don't think it's theoretically possible to write a RegEx or even a series of RegExes that fully parses out Markdown. RegEx is a regular grammar, whereas Markdown is irregular. To fully meet the spec you'd need a formal compiler, probably? Maybe something could be done with Marked (which would likely outperform RegEx, too) or strip-markdown
, which looks like it doesn't use RegEx at all.
In any case, I'm glad to know it's satisfactory for now. If more people start complaining about formatting marks being counted as words, I can look into something more intricate.
Problem
What problem would your idea solve? How do you currently manage this problem? I'd like to create writing templates that include metadata, guidelines inside markdown comments, and links to other notes. Word count and character count would only be meaningful if I can exclude metadata, comments, and links.
Idea
Describe the feature you want to suggest. Would it be possible to calculate word count and character count for all text EXCEPT metadata, comments, links, or any other non-visible text?