vkbo / novelWriter

novelWriter is an open source plain text editor designed for writing novels. It supports a minimal markdown-like syntax for formatting text. It is written with Python 3 (3.9+) and Qt 5 (5.15) for cross-platform support.
https://novelwriter.io
GNU General Public License v3.0
2.01k stars 102 forks source link

Export to DOCX #1537

Open EugeneUvin opened 11 months ago

EugeneUvin commented 11 months ago

At least on Linux version there is no export to DOCX. in Ukraine many novel competitions accept novels in this format. Can it be added?

image

vkbo commented 11 months ago

There are no plans to add it at this time. The Open Document (odt) format is a well supported open format supported by most office applications, including MS Office. Any major office application can also easily convert the file to docx if needed.

These are two competing open formats. I have no idea why Microsoft had to make their own open format when one existed already, but that's Microsoft in a nutshell I guess. It's a lot of work adding writers for these very complex formats, but I can keep the feature request in the backlog for now. Maybe someone wants to contribute it at some point, or I find the time to write it.

EugeneUvin commented 11 months ago

Yes, both formats exist - the use case is that one exports directly to the format that a competition accepts, makes some formatting adjustments and sends it out. My intention is to share this app with Ukrainian amateur writers who are often not familiar with format alternatives and will be confused by the absent option.

Ideally, having shortcut to format the output file specifically per common requirements would be helpful for authors.

vkbo commented 11 months ago

I get that, but this is not a shortcut, it requires a full implementation of the document standard to support a new file format. These XML-based document formats are complex and require a lot of research and trial and error to get to work. They aren't trivial, like HTML and the other formats supported. The Open Document writer is over 1500 lines of fairly complex code that took me weeks of my spare time to make, and then many fixes to get to work right and conform to the standard.

This is a fairly big feature request, and the result is no increase in applications that can open the file, since odt and docx have overlapping support.

vkbo commented 11 months ago

I'm not saying it won't be added. It's been in my long term plan for a while. I just don't have the capacity to do it in the foreseeable future, and there are a lot of feature requests higher up on the list.

johnblommers commented 11 months ago

Let me argue that even beginning writers need to understand about file formats. Offering a docx export option is not helpful. Instead inform them MSWORD opens ODF files. Share about Pandoc. Inform that better writing tools open source ones are freely available. Save them money and teach them Linux.Adding docx export is a poor use of resources.Sent from my iPhone

EugeneUvin commented 11 months ago

I'm not saying it won't be added. It's been in my long term plan for a while. I just don't have the capacity to do it in the foreseeable future, and there are a lot of feature requests higher up on the list.

I thought you use format specific libraries? Pandoc is able to convert from md to docx - most probably it provides a library for this conversion.

vkbo commented 11 months ago

No, there are no libraries in use by novelWriter aside from the Qt framework itself, and the optional spell checker library.

I used to have pandoc integration, but removed it because the quality of the result when creating ODT files is poor and not up to manuscript standards. Writing the ODT file directly is the only way to produce a good result. Converting ODT to DOCX seems to preserve the formatting, and there are numerous tools to do it, including pandoc. There are a ton online, which I do not recommend as they are likely data mining, and every single major officer application supports both formats, as I've already mentioned.

In the vast majority of cases a manuscript document needs to be opened in a word processor to add cover page and other formatting required by the various submission standards, so saving the result again as DOCX really shouldn't be an issue. You can also save as other formats if needed.

ODT was chosen because it is an open and well defined standard, with very wide support, and one I can actually test against as it is the native format of Libre Office, which again is cross platform. DOCX is only the native format of MS Office, which I don't even own, so I can't properly test the results. novelWriter is also written for Linux first, and developed on Linux. So ODT was the logical choice.

vkbo commented 11 months ago

Pandoc is able to convert from md to docx

Also, novelWriter is Markdown-like, not Markdown, so it has its own parser. Using pandoc directly on the source does not produce an acceptable result. That's why it was abandoned very early on in the dev history of novelWriter. @johnblommers has been around long enough to remember it I suspect.

Here are the components of the parser/writers:

johnblommers commented 10 months ago

Pandoc is able to convert from md to docx

Also, novelWriter is Markdown-like, not Markdown, so it has its own parser. Using pandoc directly on the source does not produce an acceptable result. That's why it was abandoned very early on in the dev history of novelWriter. @johnblommers has been around long enough to remember it I suspect.

Yes indeed I remember it this way too.

BTW my comment about Pandoc was meant in this context:

  1. Write in novelWriter mywork document
  2. Export to mywork.odt
  3. Exit novelWriter
  4. pandoc mywork.odt -o mywork.docx

Admittedly I have blinked as I had not grasped that Veronica had a custom-written ODT exporter. Definitetly it's better to use Pandoc's power to convert ODT to DOCX. There is even a Pandoc feature to leverage a DOCX template. One has merly to review the excellent Pandoc documentation to learn more about this amazing tool.

vkbo commented 10 months ago

Yeah, the original issue with pandoc was that it couldn't convert Markdown to ODT and produce a good enough result for this use case, and neither can HTML. Those are limitations of the source formats. Markdown has far too little formatting options, and HTML isn't designed for pages, but for scrollable text.

With the ODT writer, I have full control. Converting the ODT document to other office document formats preserves all that.