vkbo / novelWriter

novelWriter is an open source plain text editor designed for writing novels. It supports a minimal markdown-like syntax for formatting text. It is written with Python 3 (3.9+) and Qt 5 (5.15) for cross-platform support.
https://novelwriter.io
GNU General Public License v3.0
2.03k stars 102 forks source link

Optional word count for text only (not headings) #1109

Open nandoflorestan opened 2 years ago

nandoflorestan commented 2 years ago

The word count correctly disregards comments. But I consider it a problem that it counts headings.

Headings include Part Name > Chapter Name > Scene description > Section description.

As a writer, I develop the entire outline first, resulting in long scene descriptions and section descriptions which I wish to keep in the novel source. Words that aren't going to be output in the book certainly should not be counted.

Therefore, I would like a configuration setting allowing me to disable counting words in headings.

Not sure if this should be a knob on the project or on the application. Maybe the project, because when the text isn't a novel, often you do want to output the subheadings.

vkbo commented 2 years ago

I can see the value in this. A couple of points to note though:

Firstly, the word count in the project tree is really intended to track work progress. It excludes comments mainly because they could easily inflate the numbers too much. Personally, I use the count to see how much work I have produced in a writing session. In that respect, the titles are also work. Technically, so can comments be, so I just had to draw the line somewhere sensible. Incidentally, the word counter also counts documents that have been excluded from the final manuscript, so it was never intended as a way to track the count of the final result.

Secondly, the only reliable place to get the word counts for your manuscript is from the document produced by the Build Novel Project tool. An estimate is also produced by the Project Details dialog.

From a technical point of view, ignoring headers in word counts is fairly easy. But doing this in the project tree will invalidate the statistics already collected for a project since the current implementation doesn't keep track of where the words were counted. Having such an option could be confusing and disruptive to those who use the word count and writing statistics to track progress.

I still like this suggestion though, so I have two ideas for where it can be used:

Firstly, filtering out headers in the counts presented in the Novel and Outline views are doable. They are not used for statistics at all. Flipping a switch would require a rebuild of the Index, but that's easy enough.

Secondly, in release 1.8 I will do a full rewrite of the Build Novel Project tool. Presenting statistics based on the selected options for a manuscript build would be a nice addition, and adding the ability to filter out headers would fit well here. I also want to remove the Project Details dialog, so I anyway need a replacement for it. (The Project Details tool will be partially broken by the upcoming 1.7 release anyway.)

nandoflorestan commented 2 years ago

Thank you for being so thoughtful. novelWriter is a joy to work with and I dream of making a video to show how I use it.

The most important places where I really want word counts for the text only are:

  1. The "Words:" count just below the Document Editor.
  2. The Project Tree, which I use to navigate all the markdown files. The "Novel" item at the top of the tree is where I'd like to see a word count very close to the one in the final product. It already excludes the other sections such as Characters and Plot.

I understand other people will prefer to track total work, not final product. That's just not me. A global setting on the project would suffice to make everyone happy, wouldn't it. It's fine if the default behavior remains as is.

Again, thank you for this amazing piece of software!

nandoflorestan commented 2 years ago

By the way, the m-dash is currently counted as a word. It shouldn't — since the n-dash and the narrow dash are not counted. All punctuation must not be counted as words.

vkbo commented 2 years ago

The word count algorithm counts anything separated by one or more whitespace (the Python definition of a whitespace) as a word. In order to accommodate styles where dashes are not surrounded by spaces, the Unicode en-dash and em-dash produced by typing two or three hyphens are replaced with spaces before the counting algorithm is run, so they are treated as if they were whitespaces and thus not counted.

If you use other Unicode characters surrounded by white spaces, they will be counted as words.

Edit: The dashes supported are u2013 and u2014, in case you want to check.

nandoflorestan commented 2 years ago

Sorry for the noise. I have been using u2015 "Horizontal bar". Its description is: used to introduce quoted text in some typographic styles; “quotation dash”. On my screen it looks identical to the em dash; this is why I got confused.

vkbo commented 2 years ago

No problem. Technically this should be a separate feature request issue as it is a different request than the topic of this issue ticket.

In any case, the horizontal bar is more or less equivalent to the em-dash. Sometimes a bit shorter in some fonts, but style-wise, the em-dash can be used for the same purpose. Just not an en-dash.

Since the word counter runs quite often, I don't really want to bloat it with many exceptions. As it is now it takes advantage of the optimised split routine that exists in Python. It is quite fast. However, each exception requires a separate pass through the text, with an additional pass if the text contains the exception character. Since the en- and em-dash are frequently used by some writers, I made an exception for those. I am reluctant to add more, because Python is quite slow at processing data. Keep in mind the algorithm used also needs to be general enough to work for multiple languages with different rules for typesetting.

Generally, I've used LibreOffice Writer as a reference on how to both process auto-replace cases and how to count words. It too ignores en- and em-dash, but not the h-bar.

As I've already explained, the word count provided by the editor and project tree are not meant to be pinpoint accurate. They are meant to be indicative. You don't know the actual word count until you've built the manuscript anyway, as there are several features that interferes with it, so a decent estimate is as good as it gets.

I'm happy to implement a decent statistics module for the new build tool that will be the focus for release 1.8. I would be happy to keep an issue ticket to draft a good spec for such a statistics feature that takes into account various special cases like the ones discussed here. Processing time is much less of an issue in this tool as the user expects it will take time anyway.

vkbo commented 2 years ago

As for my reference to how LibreOffice handles it, see the screenshot below. The first line is a horizontal bar, the second is an em-dash. 3 words, 12 characters.

Screenshot from 2022-09-16 12-48-25

nandoflorestan commented 2 years ago

Please don't worry about dashes anymore! For anyone else reading, the following command replaces u2015 with u2014 em-dashes in 2 directories:

sed -i 's/―/—/g' content/* planning/*

I shouldn't have said anything about dashes in this ticket. Back to the main issue:

By now I have implemented word count in my Python build script, which takes novelWriter's HTML as input.

Seeing an end product word count in a build screen is not that much better for me than seeing it in my build script. If anyone else would like to see an end product word count rather than work word count in novelWriter's main interface, this is a good time to chime in and say why.

The main problem are scene and section headings, those are are long enough to obscure the count for my purposes. If I could just have the option of not counting those, then it would be close enough for me.

vkbo commented 2 years ago

By now I have implemented word count in my Python build script, which takes novelWriter's HTML as input.

Seeing an end product word count in a build screen is not that much better for me than seeing it in my build script. If anyone else would like to see an end product word count rather than work word count in novelWriter's main interface, this is a good time to chime in and say why.

Various stats for the end product is a requested feature already, and the Project Details panel will be going away and the features it provides merged into the Build Tool. Adding this one value is not an issue.

The main problem are scene and section headings, those are are long enough to obscure the count for my purposes. If I could just have the option of not counting those, then it would be close enough for me.

It seems to me you may be using headings for what the synopsis feature was designed for. The synopsis is handled already as you are requesting, for exactly the reasons you are mentioning.

I'm gonna list your request as a potential feature. I may limit it to the Novel Tree and Outline View only, because those views simply display indexer data, and the information is not used for anything internally that is project related. As mentioned, this is not the case for the Project Tree, so such changes will be disruptive and need to be handled somehow.

I'm curious to know why these word counts matter so much? I've written semi-professionally for magazines, who do care a lot about word counts, but for fiction writing all that usually matters is ballpark estimates with fairly large margins of error.

vkbo commented 2 years ago

Note:

The idea to count only body text for the manuscript has been split out into #1114. The remaining request needs to be looked into a bit further to see how it can potentially be implemented.

nandoflorestan commented 2 years ago

This post only describes how I am using novelWriter.

Just as you thought, the word count thing is self-imposed. I am writing a very easy to read book. Seeing the word counts were easy and immediate, I decided each chapter must be from 750 to 950 words long. This has helped me write them and I suppose the brevity and constancy will help some children read them. I do realize such a constraint is not common in any literature. I also plan to have exactly one illustration per chapter, so if chapters fit nicely into 2 or 3 pages, the illustration can take an entire page. I know everything sounds arbitrary but I guess it's a matter of visualizing the end product.

Now when I start a chapter I paste from my plot outline. I like to use scene and section headings for the outline. I set up the export so novelWriter separates scenes with an asterism ⁂. Scene and section headings are kept in the book source so I can see the outline and easily change the order of events in a future line edit.

I see your point that I could use % Synopsis: comments to circumvent the word count issue. When I started writing I used Synopsis. But then my headings were empty and taking a lot of space in the editor, and I was writing the word "Synopsis" more often than I cared for. (I even set up a keyboard macro in espanso to type the word Synopsis for me.) Also, the Outline view in novelWriter is more pleasant with fewer columns, and I can remove the Synopsis column. So I turned all my Synopsis comments into headings to save me some space and some typing — and then I noticed the word counts had increased...

I am not going back to using Synopsis because 1) It makes me invent a title for each section just to have something in there and 2) It makes the Outline harder to read. Example:

Chapter 1 - Howdy
    Scene 1.1 - Tries stuff (and then, far right on the Synopsis column: Roy tries doing stuff but it doesn't work because life sucks.)

Better than the above is if I only have to look at the tree on the left:

Chapter 1 - Howdy
    Roy tries doing stuff but it doesn't work because life sucks.
vkbo commented 2 years ago

You are of course free to use headings and comments as you wish. Part of the idea of novelWriter is to impose few restrictions. The only real restriction is header level. The app needs some way to be able to infer the structure. However, it's hard to account for all the various ways people use novelWriter, and naturally I originally built it the way I tend to lay out my writing. A lot of features do come from people making requests to fit their style, so I'm generally happy to look at solutions as long as they don't interfere with other work flows.

I will look at what can be done with respect to word counting methods. I think it will have to be a multi-step process. Adding the info in the Novel Tree and Outline View is relatively easy, and at least you can flip to that tab every now and then to check. In the editor window you can also highlight the body text, and the editor footer will show the word count of the selected text instead. It's a bit more manual, but I guess you don't need to check that often.

I can also add a format shortcut that creates the synopsis comment. I can see that it could be a bit repetitive to type when building an outline. I don't really work that way, so I haven't given it much thought. I use my own version of the snowflake method, and I generally build my outlines with scene headers with a single paragraph first. I add the synopsis when I start expanding the scene, so it's not repetitive in my work flow.

vkbo commented 6 months ago

In 2.4 the manuscript tool will have detailed statistics in a panel below the preview. Maybe this is sufficient?

image