jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.64k stars 3.38k forks source link

docx writer: Option to customize (or remove) title/author block #3109

Open HeirOfNorton opened 8 years ago

HeirOfNorton commented 8 years ago

Hello, Would it be possible to add a way to customize, or at least remove, the Title/Author/Date block in Docx output?

Nearly any journal or publisher will have their own specific format for the title page/title block of a document submitted to them, but since Templates are not feasible in Docx, the Title/Author/Date block is hardcoded. This means a Docx file created with Pandoc will need to be edited manually to fix this.

If there was a command-line option to suppress the title-block entirely, then users could make their own title block in the source file or using a filter, using the new Custom Classes in master to control the style formatting.

Thank you.

jgm commented 8 years ago

If you're willing to use a filter, you could simply have the filter remove title/author/date metadata.

HeirOfNorton commented 8 years ago

True, and probably what I will do in the meantime if necessary. That is not ideal, though, since that would also remove the Title and Author metadata from the resulting Docx file properties.

jkr commented 8 years ago

This would be a problem that would be solved by being able to use proper .dotx templates, right? It might be time for us to start trying to think through that.

HeirOfNorton commented 8 years ago

Kinda? .dotx templates don't really do what Pandoc templates do. They are just normal Word documents that, when opened, create a new copy of the file rather than saving over the original.

Word does have a way to mark a spot for inserting other text, using the REF field code. Eg:

<w:fldSimple w:instr="REF Title">
    <w:r>
        <w:t>This will insert the Bookmark named Title</w:t>
    </w:r>
</w:fldSimple>

But it can be difficult to insert these for users. I think they can only be inserted for bookmarks that already exist in the document. Maybe, in that case, include basic bookmarks (Title, Author, etc.) in the default template.docx? I dunno, I don't know how difficult it is to parse such things in the source.

jkr commented 8 years ago

Right -- I was thinking of parsing the fields and instrText. In particular, I was interested in the fact that you could make one with TITLE field and so on. A bit of experimentation suggests that they might actually be easier to parse than they are to make -- it really is a pain if you're not using one of the prefab ones.

The main plus here is that they do allow for some more fine-grained control of output. The problem with adding a command-line flag is that there might be someone who wants, say, Title, but not Author. It would be nice if there was a way to allow that. At the moment, while I think through this, it might remain filters, though.

Also -- not that this is a permanent solution -- it wouldn't be too hard to write a post-processing script in python that would inject the title and author into docProps/core.xml after you filter them out. (zipfile, etree, py-yaml).

jgm commented 8 years ago

+++ Jesse Rosenthal [Sep 09 16 05:35 ]:

This would be a problem that would be solved by being able to use proper .dotx templates, right? It might be time for us to start trying to think through that.

Yes, probably a good idea.

tomk3003 commented 6 years ago

I still would like to see an option to not write the metadata to the head of the resulting docx. If I want to include them I can always do that via the reference-doc and I am able to include the information at the appropriate place.

W1M0R commented 3 years ago

If you're willing to use a filter, you could simply have the filter remove title/author/date metadata.

Here is a lua-filter for those who wish to follow that route. Put the following code in a file called stripmeta.lua:

function strip_meta(meta)
  meta.title = nil
  meta.subtitle = nil
  meta.author = nil
  meta.date = nil
  meta.abstract = nil
  return meta
end

return {{Meta = strip_meta}}

Then use it like this:

pandoc --lua-filter=stripmeta.lua -o output.docx input.md
mjfs commented 3 years ago

I was also often in a situation where skipping the default output of metadata blocks in DOCX output was needed. Hence I was wondering whether there is a consensus on what the preferred way forward is?

As I see this issue the solution would necessitate one of the following two:

(a) Implementing a simple command line switch that activates skipping unwanted output (e.g. --skip-standard-meta-output) - this is probably the simplest and the quickest solution.

(b) Implementing DOCX templating as is the case with a few other Pandoc supported formats - when designing templates in Microsoft Word, content controls can be used for this purpose.

I could probably find some time to implement the functionality under the point (a) above, if proposed approach is considered acceptable.

@jgm: I would suggest avoiding simultaneous introduction of DOTX templates due to the fact that this:

jmclawson commented 2 years ago

Regarding option (a) of @mjfs :

(a) Implementing a simple command line switch that activates skipping unwanted output (e.g. --skip-standard-meta-output) - this is probably the simplest and the quickest solution.

Unless I'm misunderstanding the code, which is very possible, much of Pandoc's "templating" seems set in two lines, Docx.hs lines 785--786. Introducing a command line switch to activate skipping certain output would probably redefine meta in line 785. Could such a switch also allow arbitrary reordering of these elements, too? Something taking an argument like "title, subtitle, date, abstract" would know to skip the author, but the same interface could also allow for reordering fields as needed. For instance, some humanities journals prefer "author, date, title, subtitle, abstract".

If I'm being greedy, it would also be nice to support custom metadata properties in this kind of interface, too, along the lines of what's already supported by https://github.com/jgm/pandoc/issues/3034. But that's probably another issue.

fcskit commented 9 months ago

I do have the same issue. I would like to include the metadata fields but in a different order. To me the best solution would be a switch (yaml or command line flag) that turns the default output of the meta data off. Then add them again at the desired position through metadata variables that can be inserted at any position in the document.

This could be done by a simple lua-filte as described here: https://stackoverflow.com/questions/53355628/declaring-arbitrary-variables-for-pandoc-conversion-using-yaml-metadata-block

Preferable that would be implemented in the writer itself so that predefined styles for title, abstract, date, etc. would be automatically applied.

In addition that would add much more flexibility by adding custom metadata information to your document.