vkbo / novelWriter

novelWriter is an open source plain text editor designed for writing novels. It supports a minimal markdown-like syntax for formatting text. It is written with Python 3 (3.9+) and Qt 5 (5.15) for cross-platform support.
https://novelwriter.io
GNU General Public License v3.0
1.92k stars 99 forks source link

Epic: Editor Shortcodes #1541

Open vkbo opened 8 months ago

vkbo commented 8 months ago

There are a number of feature requests that would require some form of short code format to implement. This is an Epic tracking the implementation of shortcodes for extended feature support in the novelWriter editor.

The shortcode syntax is on the following form.

Inline commands (self-closing) will have limited use, but for footnotes defined elsewhere, they are useful. I don't propose to add a special self-closing syntax. So, it will behave like HTML not XML/XHTML.

The syntax is compatible with the syntax already added for page breaks and vertical space. These formats can be modified to conform to the rules without breaking backwards compatibility.

Rationale

These are easy to implement in the syntax highlighter, and they are also very easy to parse. The syntax is also not too obscure, since short codes have been used for plain text formatting online for quite some time. Since novelWriter is a fiction writing app, they are also not frequently needed. However, a context or special menu for inserting short codes is probably a good idea, because most people don't want to memorise them.

Shortcode Features

peter88213 commented 5 months ago

This promises several new possibilities, such as controlling the spell checker via language and country codes, e.g. [lang:en-US]. In the context of my yWriter file converters, such shortcodes have actually turned out to be quite easy to handle, even if you use regular expressions or simple string replacement.

However, I would like to mention that angle brackets, i.e. a pseudo-XML syntax, offer the advantage that you can use the sax.ContentHandler for parsing, or possibly other XML tools out of the box to check the well-formedness with minimal effort. I'm currently working on this myself,

vkbo commented 5 months ago

Angle brackets are already used for the auto-complete feature, so they cannot be used for anything else since they are free form.

The shortcodes are parsed by RegEx in novelWriter too, so if you want to ensure compatibility, you can find them in the nwRegEx class in: https://github.com/vkbo/novelWriter/blob/main/novelwriter/constants.py

The current ones are:

FMT_EI = r"(?<![\w\\])(_)(?![\s_])(.+?)(?<![\s\\])(\1)(?!\w)"
FMT_EB = r"(?<![\w\\])([\*]{2})(?![\s\*])(.+?)(?<![\s\\])(\1)(?!\w)"
FMT_ST = r"(?<![\w\\])([~]{2})(?![\s~])(.+?)(?<![\s\\])(\1)(?!\w)"
FMT_SC = r"(?i)(?<!\\)(\[[\/\!]?(?:i|b|s|u|sup|sub)\])"
FMT_SV = r"(?<!\\)(\[(?i)(?:fn|footnote):)(.+?)(?<!\\)(\])"
peter88213 commented 5 months ago

I see. In most cases, I translated shortcodes with replacement lists (*). The only pitfall when converting e.g. into ODT format were the cases where shortcode-tagged passages spanned several paragraphs (yWriter allows this).

* Just for the record, a code example:

odtReplacements = [
    ('[i]', '<text:span text:style-name="Emphasis">'),
    ('[/i]', '</text:span>'),
    ('[b]', '<text:span text:style-name="Strong_20_Emphasis">'),
    ('[/b]', '</text:span>'),
    ]
for yw, od in odtReplacements:
    text = text.replace(yw, od)    

This is preceded by a routine that closes shortcode tags before line breaks and reopens them afterwards, like so:

#--- Process markup reaching across linebreaks.
tags = ['i', 'b']
newlines = []
lines = text.split('\n')
isOpen = {}
opening = {}
closing = {}
for tag in tags:
    isOpen[tag] = False
    opening[tag] = f'[{tag}]'
    closing[tag] = f'[/{tag}]'
for line in lines:
    for tag in tags:
        if isOpen[tag]:
            line = f'{opening[tag]}{line}'
            isOpen[tag] = False
        while line.count(opening[tag]) > line.count(closing[tag]):
            line = f'{line}{closing[tag]}'
            isOpen[tag] = True
        while line.count(closing[tag]) > line.count(opening[tag]):
            line = f'{opening[tag]}{line}'
        line = line.replace(f'{opening[tag]}{closing[tag]}', '')
    newlines.append(line)
text = '\n'.join(newlines)

Perhaps not the most efficient solution, but it ensures that the generated XML code will be well-formed as for formatting.

vkbo commented 3 months ago

I see. In most cases, I translated shortcodes with replacement lists (*). The only pitfall when converting e.g. into ODT format were the cases where shortcode-tagged passages spanned several paragraphs (yWriter allows this).

Just a thought when re-reading this.

How do you handle nested formats? That was my own first implementation, and if I recall correctly, the Open Document Standard allows it, but LibeOffice, which I use as reference implementation, doesn't seem to. The HTML converter in novelWriter uses a simple lookup table, since HTML handles nested formatting.

Because of this issue with LibreOffice, I wrote a rather complex algorithm (which I've later rewritten to a much simple form) that detects text fragments where all characters have the same format, and use an "edge detection" approach to check where the aggregated format changes (I use binary masks to track this) and generate a new T1 format key for ODT on each unique format. I'm pretty sure this is how LibreOffice does it internally too.

This is incidentally also how the text layout in Qt is implemented as well, so it's actually quite easy to serialise text formats to and from a rich text implementation using Qt. I wrote one in C++ a couple of years ago that serialised a rich text document into JSON.

peter88213 commented 3 months ago

Yes, when combining italics and bold, LibreOffice (and OpenOffice) create new character styles in the office:automatic-styles section. Combined with other formatting or language settings, this can result in quite a number of styles. I decided not to use the nesting in my self-defined file format (which replaced the yw7 format some time ago). When writing with OpenOffice, I use the Emphasis and Strong Emphasis character styles instead of hard italic and bold formatting, so there is no nesting possible at all.

When parsing the ODT format, I first create lookup tables from all automatic style formats, to see which ones contain bold or italics. Then I translate them into Strong or Emphasis, where ~Strong wins over Emphasis~ nesting actually works.

This is a shortcode-formatted text created with yWriter:

Hal Spacejock was sitting at the [i]Black [b]Gull[/b]'s[/i] flight console

The nested shortcode seemed to be no problem for my ywriter-ODT conversion, since OpenOffice accepts code like this:

Hal Spacejock was sitting at the <text:span text:style-name="Emphasis">Black 
<text:span text:style-name="Strong_20_Emphasis">Gull</text:span>&apos;s
</text:span> flight console

Update: Nesting Strong and Emphasis works with OpenOffice and LibreOffice, so the parsed result of the example shown above looks like this:

Hal Spacejock was sitting at the <em>Black <strong>Gull</strong>'s</em> flight console

As a side note: With my project, I use my own xml dialect with formatting tags similar to xhtml.

2nd Edit: Added ywriter shortcode example.

Another update:

here is an example where bold and italics shortcoded passages overlap:

Hal Spacejock was sitting at the [i]Black [b]Gull's[/i] flight[/b] console

After conversion to ODT, the first shortcode tag "wins" due to the unspecific xml closing tags:

Hal Spacejock was sitting at the <text:span text:style-name="Emphasis">Black 
<text:span text:style-name="Strong_20_Emphasis">Gull&apos;s
</text:span> flight</text:span> console

Thus, the result looks like in the "nesting" example shown above.

Conclusion: Nesting ODT xml spans works well. Converting "overlapping" shortcode formatting with my "find and replace" algorithm produces a result that is wrong, but accepted by OpenOffice like nested.

vkbo commented 3 months ago

Interesting. I'm pretty sure I tried and failed to do something similar. Could be I made another mistake, or they've updated since then. In either case, novelWriter generates styles in the same way LibreOffice does, so it works. It can potentially generate a large number, that's true.

I use the Python xml module to build the element tree, which requires some quirky code when using .text and .tail for building formatted text. I've since looked in the Qt source code, and they do pretty much the same. Although the Qt ODT format is very limited so I never used their ODT feature.

peter88213 commented 3 months ago

Sorry, I was just updating my former comment again, adding an example for overlapping shortcode.

I use the Python xml module to build the element tree, which requires some quirky code when using .text and .tail for building formatted text.

Do you mean the elementtree module? For parsing the ODT content.xml file, I prefer an event-based sax parser. Here you can take a look: https://github.com/peter88213/novxlib/blob/main/src/novxlib/odt/odt_parser.py

As a side note: I have OpenOffice 3.4 from 2012, and LibreOffice 7.5 installed. The examples shown above do work with both programs.

vkbo commented 3 months ago

I don't parse ODT, only write it.

peter88213 commented 3 months ago

Ah, I see. So you build the ODT DOM tree with elementtree? I use a template engine for the document header and footer, and the parts/chapters. This works for all types of output documents. For yw7 shortcode-formatted scene content, the find/replace algorithm did the job quite well. Now, that I use an internal xml format, the transformation to ODT xml is easily done with a sax parser that replaces the inline xml tags one by one.

vkbo commented 3 months ago

Ah, I see. So you build the ODT DOM tree with elementtree?

Yes, exactly. I don't use any third party tools or libraries. Everything is built from scratch as either a flat fodt XML or the various files needed for a zipped ODT file. I only support the formatting tags actually needed by novelWriter, so it's a subset of the Open Document standard.

It's still a fair bit of code: https://github.com/vkbo/novelWriter/blob/dev/novelwriter/core/toodt.py

Since each formatting tag is assigned a binary bit, I can just join them with an or operation, producing an integer number for each combination of formats. They are created on first use with an incremental T<N> name like LibreOffice does, and then looked up based on the integer value. It's quite efficient.

https://github.com/vkbo/novelWriter/blob/2480c7ae9f02b453599aa047a3aed445bfa24f6e/novelwriter/core/toodt.py#L716-L743

peter88213 commented 3 months ago

Very nice, all these hex constants and bitmasks. When I was a student, it was said: "A real programmer writes Fortran code in any language". What is it today? ;-)

vkbo commented 3 months ago

Not sure, but I have written a lot of Fortran. Especially when I worked on SixTrack. I also wrote a SHA256 hash implementation in Fortran just to see how difficult it would be. The code is here. 😃

peter88213 commented 3 months ago

When I first looked at the novelWriter code, I said to myself: "Java programmer". This time, I'd rather guess "C". No offense, I for myself started my engineering career with Assembler. By the way, does the C++ variant of novelWriter still have a future?

vkbo commented 3 months ago

When I first looked at the novelWriter code, I said to myself: "Java programmer".

Never touched the stuff!

This time, I'd rather guess "C".

Sure, a bit. I started on AMOS (basic for Amiga) and Visual Basic, then worked with PHP a lot, and then a fair bit of C++ and Fortran. Not a lot of C. I also did a lot of Matlab for a while. Been working with Python for years now.

No offense, I for myself started my engineering career with Assembler. By the way, does the C++ variant of novelWriter still have a future?

C++ is so much more verbose than Python, so it takes a lot longer to write the code. I sometimes look in on the code, but while there is so much work to be done on novelWriter, I doubt I can handle another project. Maybe when I retire in 20 years!

peter88213 commented 3 months ago

Looking at the issues here, with all the feature requests, I can well believe that novelWriter will become your life's work. I've become so comfortable with Python that I don't really want anything else. I would have a good reason to get into JavaScript or TypeScript for Obsidian plugins. But no real enthusiasm.

vkbo commented 3 months ago

I like Python. It's what I do for my day job as well. I prefer writing object oriented code, which I guess is why you suspected Java. I have given golang a go as well (no pun intended) and various other languages, but Python really is good at a lot of things. Since I have a computational physics background, it was always too slow for work I've done in the past. Only Fortran, C and C++ would do. So I'm happy that I can now work with something more straightforward. It's probably why I put in so much time on this project. A lot of the things I want to do I can achieve (at a first draft level) in a couple of hours.

peter88213 commented 3 months ago

I prefer writing object oriented code, which I guess is why you suspected Java.

If I'm not mistaken, I saw somewhere getter and setter methods. The "Pythonic" approach would be properties with decorators. But so what?

To come back to the topic: I see you have a clever implementation to create automatic character styles from combinations of direct formatting. The result is an ODT document with "hard" formatting. My concept is the other way round: I use OpenOffice as a text editor in order to transform everything into "semantic markup" during import at the latest. My aim is to layout the document completely using styles that can be controlled via document templates. I assume that "emphasized" and "strongly emphasized" are generally sufficient for a fictional text. For visually set off paragraphs there is also "Quote". I only generate automatic character styles for foreign-language passages or passages that are excluded from the spell check. In the exported document, I also have a user style for invisible subheadings that only appears in documents that are intended for reimport into my program. This allows me to make the section titles visible in the OO navigator.

vkbo commented 3 months ago

I prefer writing object oriented code, which I guess is why you suspected Java.

If I'm not mistaken, I saw somewhere getter and setter methods. The "Pythonic" approach would be properties with decorators. But so what?

A lot of my code subclasses Qt objects, which are written in C++, so I tend to follow the Qt code style for consistency. That's why the code is also camelCased. Qt uses a pattern of access methods for properties and set methods for setters. So someValue() to access data and setSomeValue(...) to set it. I sometimes also implement variations of Qt methods, and I try to use the same style for those to make it easier to remember the syntax.

I do use Python properties a lot for internal variables though, especially in the core data classes that are not inherited from Qt C++ objects. I never particularly liked the Python setter decorator though, as I sometimes have setters that take multiple related values, and I don't want to mix the two styles.

To come back to the topic: I see you have a clever implementation to create automatic character styles from combinations of direct formatting. The result is an ODT document with "hard" formatting.

Yes, this is the part that mimics LibreOffice. The paragraph styles work a little differently since I don't have bitmasks for those, so they do lookups based on sha256 hashes of string representation of the data dicts. It's a quick and dirty method, but it works. I occasionally think about improving that part.

My concept is the other way round: I use OpenOffice as a text editor in order to transform everything into "semantic markup" during import at the latest. My aim is to layout the document completely using styles that can be controlled via document templates. I assume that "emphasized" and "strongly emphasized" are generally sufficient for a fictional text. For visually set off paragraphs there is also "Quote".

I've considered adding support for ODT imports. I see the task as a little daunting, because I have a lot less control of what subset of the standard I support than when I write the doc. I would still be a nice feature even if I only parse a subset of formatting.

I only generate automatic character styles for foreign-language passages or passages that are excluded from the spell check.

Is that something to perhaps implement in novelWriter? A way to support adding text in a different language? I don't necessarily intend to support spell checking in multiple languages, but it may be useful to allow excluding regions from spell checking.

In the exported document, I also have a user style for invisible subheadings that only appears in documents that are intended for reimport into my program. This allows me to make the section titles visible in the OO navigator.

peter88213 commented 3 months ago

I only generate automatic character styles for foreign-language passages or passages that are excluded from the spell check.

Is that something to perhaps implement in novelWriter? A way to support adding text in a different language? I don't necessarily intend to support spell checking in multiple languages, but it may be useful to allow excluding regions from spell checking.

Well, it's something I often need in my writing because I don't want to populate my user dictionaries with foreign expressions or dialect. But that's why I am developing my own writing program. It has a user interface inspired by Scrivener, a data model that has its origins in yWriter, and as an editor it uses OO/LO Writer which is way better than all I can ever create myself. A plot grid is realized via ODS export and reimport. I also support a few plotting concepts that I like at DramaQueen, and there's a connection to Zim for world building. Plus synchronization with two different timeline programs. As you can see, a fundamentally different approach to that of novelWriter.

I saw novelWriter starting out as a lean application with a small learning curve, and am amazed at how you are gradually building it into a full-blown word processing program. However, the concept of plain text with markup/markdown also sets certain limits. Only you can know where they lie, but if you exceed them, the original advantages turn into disadvantages.

vkbo commented 3 months ago

However, the concept of plain text with markup/markdown also sets certain limits. Only you can know where they lie, but if you exceed them, the original advantages turn into disadvantages.

It certainly is a limitation if you expect full rich text capabilities. What annoys me with most rich text editors is that there are so many editing options that it becomes incredibly hard to wrangle it into what you want. Most office apps have the same problem. I found Scrivener annoying on this point too. It's why I've used LaTeX for my academic work, and really liked Wordpad back when I used Window. I found FocusWriter for Linux that fit that niche, but it lacks the project capabilities I wanted for writing fiction.

The reason novelWriter ended up as Markdown is partially that I actually wanted to just type my meta data directly into the text. For me that is a lot easier than dealing with forms and tables. That can of course be done in rich text too, but Python + Qt becomes a little sluggish for full on rich text, so I decided not to.

My initial approach was, as you also seem to suggest, that (strong) emphasis is all that's strictly needed. The challenge now is that people are requesting rich text features, so I pulled in the concept of shortcodes that I know from back when I posted on discussion forums. They are a decent extension, and easy to parse. The bonus here is that these features are not at all in the way in the editor when you don't use them. It is why I am willing to add a number of rich text features for those that do need them.

You still have to prefer the typed formatting approach over the point and click approach to use novelWriter. That will not work for everyone, but based on feedback, a lot of people do prefer this. So I guess it fits a certain niche of users. Which is perfectly fine.

I am considering a hybrid solution where the editor is limited rich text (like Wordpad was), but where you can populate meta data by just typing as well as apply simpe formatting. This really needs to be done in C++ to be snappy and responsive. So my idea was to basically recreate novelWriter's project approach around a rich text editor.

peter88213 commented 3 months ago

So I guess it fits a certain niche of users. Which is perfectly fine.

I am considering a hybrid solution where the editor is limited rich text (like Wordpad was), but where you can populate meta data by just typing as well as apply simpe formatting. This really needs to be done in C++ to be snappy and responsive. So my idea was to basically recreate novelWriter's project approach around a rich text editor.

Yes, I understand all that. My first word processor was WordPerfect, and I loved the parallel window with the markup. Of course I wanted to have something like that too, so I made a plugin with a plain text editor that allows me to edit the xml code directly. Similar to the early HTML editors, I have menu entries and keyboard shortcuts to insert or toggle the most common format tags. Hitting Enter inserts </p>\n<p>. But it is only an optional accessory to quickly change something in the text without starting OO Writer.

Here a screenshot for inspiration. I am using the tkinter text box; probably you can do better with Qt:

screen01

vkbo commented 3 months ago

I like the multi-window approach. A limitation of novelWriter is the small project view, a single editor and a single viewer. I've considered doing something differently in my other project with a project view and independent document editors. I don't like tabs, but dockable independent windows are an option.

peter88213 commented 3 months ago

Inspired by yWriter, I equipped my editor plugin with its own window manager so that I can open any number of sections from the main program. Instead of opening a section twice, an already open window is brought to the foreground and focused if required. However, I have never needed this feature in practice. In fact, I mainly use this editor to split up sections, which is another feature.

screen02