orchitech / redmine_reformat

A Swiss-Army Knife for Converting Redmine Rich Text Data
Other
32 stars 6 forks source link

Redmine Reformat - A Swiss-Army Knife for Converting Redmine Rich Text Data

Build Status

Redmine Reformat is a Redmine plugin providing a rake task for flexible rich-text field format conversions and batch editing.

Prepare and Install

Database Backup

Either backup your database or clone your Redmine instance completely. A cloned Redmine instance allows you to compare conversion results with the original.

Install

cd $REDMINE_ROOT
git -C plugins clone https://github.com/orchitech/redmine_reformat.git
bundle install

And restart your Redmine.

Installing Converter Dependencies

If using TextileToMarkdown converter, install pandoc version 2.2 or newer.

The other provided converters have no direct dependencies except those installed with bundle install.

Basic Usage

Current format Textile - convert all rich text to Markdown using the default TextileToMarkdown converter setup:

rake reformat:convert to_formatting=markdown

Dry run:

rake reformat:convert to_formatting=markdown dryrun=1

Parallel processing (Unix/Linux only, tested with PostgreSQL):

rake reformat:convert to_formatting=markdown workers=10

If already using the commmon_mark format patch (see #32424 and Docker image orchitech/redmine-gfm):

# convert from textile:
rake reformat:convert to_formatting=common_mark
# convert from Redcarpet's markdown - same command:
rake reformat:convert to_formatting=common_mark

Renaming or merging Redmine project can only be done directly in the database. redmine_reformat can prepare wiki links for such change:

# 1. remove project prefix for wiki links within the renamed project
# 2. rename project prefix in wiki links outside of the renamed project
convcfg='[{
  "projects": ["oldname"]
  "converters": [["LinkRewriter", { "oldname": { "project": null } }]]
}, {
  "converters": [["LinkRewriter", { "oldname": { "project": "newname" } }]]
}]'
rake reformat:convert converters_json="$convcfg"
# now you can rename the 'oldname' project to 'newname'

Convert to HTML (assuming a hypothetical html rich text format):

convcfg='[{
  "from_formatting": "textile",
  "to_formatting": "html",
  "converters": "RedmineFormatter"
}]'
rake reformat:convert to_formatting=html converters_json="$convcfg"

Convert using an external web service through intermediate HTML:

convcfg='[{
  "from_formatting": "textile",
  "to_formatting": "common_mark",
  "converters": [
    ["RedmineFormatter"],
    ["Ws", "http://localhost:4000/turndown-uservice"]
  ]
}]'
rake reformat:convert to_formatting=common_mark converters_json="$convcfg"

Other advanced scenarios are covered below.

Features

Conversion Success Rate and Integrity

TextileToMarkdown converter

MarkdownToCommonmark converter

Conversion integrity

Parallel processing

Advanced Scenarios

Use different converter configurations for certain projects and items:

[{
    "projects": ["syncedfromjira"],
    "items": ["Issue", "JournalDetail[Issue.description]", "Journal"],
    "converters": [
      ["Ws", "http://markup2html.uservice.local:4001"],
      ["Ws", "http://turndown.uservice.local:4000"]
    ]
  }, {
    "from_formatting": "textile",
    "converters": "TextileToMarkdown"
  }
]

To convert only a part of the data, use null in place of the converter chain:

[{
  "projects": ["myproject"],
  "to_formatting": "common_mark",
  "converters": "TextileToMarkdown"
}, {
  "from_formatting": "textile",
  "to_formatting": "common_mark",
  "converters": null
}]

After text passes through a converter chain, newlines are normalized in two ways:

But some converter chains might not need this behavior, so it is configurable. For example, this is the default config for conversion of markdown to common_mark:

{
"from_formatting": "markdown",
"to_formatting": "common_mark",
"converters": ["MarkdownToCommonmark"],
"force_crlf": false,
"match_trailing_nl": false
}

Provided Converters

For more information on markup converters, see Markup Conversion Analysis and Design.

Configuring Converters

Converters are specified as an array of converter instances. Each converter instance is specified as an array of converter class name and constructor arguments. If there is just one converter, the outer array can be omitted, e.g. [["TextileToMarkdown"]] can be specified as ["TextileToMarkdown"]. If such converter has no arguments, it can be specified as a string, e.g. "TextileToMarkdown".

Please note that removing the argument-encapsulating array leads to misinterpreting the configuration if there are more converters. E.g. ["RedmineFormatter", ["Ws", "http://localhost:4000"]] would be interpreted as a single converter with an array argument. A full specification is required in such cases, e.g. [["RedmineFormatter"], ["Ws", "http://localhost:4000"]].

TextileToMarkdown

Usage: 'TextileToMarkdown'\ Arguments: (none)

TextileToMarkdown uses Pandoc for the actual conversion. Before pandoc is called, the input text is subject to extensive preprocessing, often derived from Redmine code. Placeholderized parts are later expanded after pandoc conversion.

TextileToMarkdown is used in default converter config for source markup textile and target markdown.

Although there is some partial parsing, the processing is rather performed on source level and even some user intentions are recognized:

Generated Markdown is intended to be as compatible as possible since, so that it works even with the Redcarpet Markdown renderer. E.g. Markdown tables are formatted in ASCII Art-ish format, as there were cases where compacted tables were not recognized correctly by Redcarpet.

See the test fixtures for more details. We admin the conversion is opinionated and feel free to submit PRs to make it configurable.

Further development remarks: conversion utilizing pandoc became an enormous beast. The amount of code in the preprocessor is comparable to the Redmine/Redcloth3 renderer. It would have been better if pandoc hadn't been involved at all - in terms of code complexity, speed and external dependencies.

MarkdownToCommonmark

Usage: ['MarkdownToCommonmark', options]\ Arguments:

MarkdownToCommonmark edits the source text to patch the differences between Redmine Redcarpet format (called markdown) and the new common_mark format.

It parses the document with commonmarker (the library under the new commmon_mark format), assuming the basic overall structure is the same. In the end, a patched alternative commonmarker_fixed_sourcepos with patched cmark-gfm underlying library had to be created and used, as we rely on correct source position information, which is broken or missing without the patches.

The converter walks through the document tree and locates source positions to be edited. It is important to point out the output document is not a result of a parse&render process. Although the parser is involved, it only computes instructions like insert two spaces at the end of line 5. The output is always the original document with some edits.

The hard_wrap and underline replacements are quite simple, as they directly follow the document model provided by commonmarker.

The superscript processing is far more tricky, as it does not have any document-forming counterpart in CommonMark/GFM. commonmarker is used to locate carets in the right document contexts and the rest of the processing follows reverse-engineered Redcarpet code.

Macros are preserved by this converter. It also supports macros with text, which is preserved by default. The collapse macro has its text content converted.

For detailed behavior examples, see the MarkdownToCommonmark unit test.

RedmineFormatter

Usage: ['RedmineFormatter', options]\ Arguments:

RedmineFormatter uses monkey-patched internal Redmine renderer - textilizable(). It converts any format supported by Redmine to HTML in the same ways as Redmine does it. The monkey patch blocks macro expansion and keeps wiki links untouched.

LinkRewriter

Usage: ['LinkRewriter', wiki_link_rewrites]\ Arguments:

LinkRewriter uses monkey-patched internal Redmine renderer - textilizable() to analyze the individual wiki links. Only valid links leading to an existing page are considered at the moment. The actual rewriting is performed on the source text, so there should be no side effects. For the same reasons, this converter can be used with "force_crlf": false, "match_trailing_nl": false.

Limitations:

Ws

Usage: ['Ws', '<url>', options]\ Arguments:

Ws performs HTTP PUT or POST request to the given URL and passes text to convert in the request body. The result is expected in the response body. This allows fast and easy integration with converters in different programming languages on various platforms.

Log

Usage: ['Log', options]\ Arguments:

Log logs what is going through the converter chain. Useful for debugging or searching for specific syntax within rich text data. The converter hands over the input as is.

Reformat Microservice

For certain integration and testing use cases, it might be useful to expose the converter engine for use of external services. redmine_reformat provides a simple HTTP service for this purpose in the reformat:microservice rake task. The setup is very similar to the reformat:convert rake task.

rake reformat:microservice from_formatting=common_mark
Running with setup:
{:converters_json=>"(use default converters)",
 :to_formatting=>nil,
 :workers=>1,
 :port=>3030,
 :from_formatting=>"common_mark"}
[2020-03-27 22:53:16] INFO  WEBrick 1.4.2
[2020-03-27 22:53:16] INFO  ruby 2.6.5 (2019-10-01) [x86_64-linux]
[2020-03-27 22:53:16] INFO  WEBrick::HTTPServer#start: pid=5343 port=3030
(CTRL+C or TERM signal closes the server)

In the example above, visit http://localhost:3030 to get more info on usage.

The microservice works as follows:

Invokation example:

curl -XPUT -H 'Content-Type: text/plain' -d '# Foo' 'http://localhost:3030?to_formatting=html'
# produces '<h1>Foo</h1>'

History

The project has its origins in Textile to Markdown conversion scripts and plugins for Redmine. Although there is not much of any original code left, we really value the community contributions of our predecessors.

  1. convert_textile_to_markdown script was built upon @sigmike answer on Stack Overflow
  2. later slightly modified by Hugues C.
  3. Completed by Ecodev and published on GitHub.
  4. Significantly improved by Planio / Jens Krämer: GitHub fork
  5. Conversion rewritten by Orchitech and created the conversion framework redmine_reformat. Released under GPLv3.