olivierkes / manuskript

A open-source tool for writers
http://www.theologeek.ch/manuskript
GNU General Public License v3.0
1.71k stars 226 forks source link

Skip comments in word count #1205

Closed sagev9000 closed 10 months ago

sagev9000 commented 10 months ago

Hi! I'm not sure if others have run into this, but there have been a few times were I've left notes laying around and thought I was closer to my word count goal than I actually was. Wanted to see if others might appreciate dropping comments from the word count.

Notable caveat: I'm not sure a good way to handle the possibility of comment tokens inside of a code block without re-implementing a lot more parsing logic. I also understand that the current solution is much much more elegant than this proposal.

So, I'd happily take suggestions and fully support shooting this down, but I figured it was worth a shot. Maybe someone sharper than me would have a better idea for it :sweat_smile:

Thanks for taking a look!

TheShadowOfHassen commented 10 months ago

I'm not familiar with comments in Markdown, but this is in an idea that I think is worth it. I sometimes write notes to myself and while they won't throw off my word count much It's a nice thing to have.

What is a comment in markdown exactly?

TheJackiMonster commented 10 months ago

I'm not familiar with comments in Markdown, but this is in an idea that I think is worth it. I sometimes write notes to myself and while they won't throw off my word count much It's a nice thing to have.

What is a comment in markdown exactly?

Markdown does not support comments. So in Manuskript you do HTML comments which wil not be included into the exported document since Pandoc translates Markdown into HTML as intermediate, I assume. That's also the reason you can use HTML tags inside the Markdown files and it will likely apply properly to the final document. I've used this for alignment in the past for example.

Anyway an issue I have with this MR is that it will cause a significant hit to performance/latency of the editor. Because we calculate the word count quite a lot (every time the text changes) and it has been a bottle-neck in the past already. Also there are still exceptions when the code suggested would not work properly (HTML comments do not require any spacing for example to work, so opening and closing comments might be in the middle of a regex matching word).

Instead I suggest a different approach: We could use a regular expression to remove all comments from our text and search words using the existing regular expression on that result. Using regex will be much faster because it's implemented inside the Python runtime in C instead of using Python loops. We can avoid the edge cases, I mentioned, as well.

TheJackiMonster commented 10 months ago

This should do it:

def wordCount(text):
    return len(re.findall(r"\S+", re.sub(r"(<!--).+?(-->)", "", text)))
sagev9000 commented 10 months ago

@TheJackiMonster Brilliant. Much wiser approach.