zeldigas / text2confl

Publisher of documents to confluence
Apache License 2.0
13 stars 2 forks source link

Text2confl fails on boldness like this `*` #182

Open feliksik opened 1 month ago

feliksik commented 1 month ago

The following asciidoc makes text2confl fail:

= test

This is a `*` weird `*` bold thing.

Output:

Some pages content is invalid:

  1. ./test.adoc: [6:30] The element type "strong" must be terminated by the matching end-tag "". Start tag location - [6:28]

Admittedly, it's weird stuff, I actually intended to show monospace asterisks; but the asciidoc html output is ok, so ideally text2confl would not choke.

zeldigas commented 1 week ago

The thing is that asciidoctor produces broken xml. It is just html rendering engine that forgives such issues: image

One of the option that I can applly here - use Jsoup (already in deps) to auto-fix html document, it has such a feature. Do you think it will make sense?

zeldigas commented 1 week ago

JSoup normalizes it like this: <p>This is a <code><strong></strong></code><strong> weird <code></code></strong> bold thing.</p>

feliksik commented 1 week ago

That a great point @zeldigas , I didn't realize this. I report this as asciidoctor bug upstream, but it will not be fixed.

I'm not sure what's wise here to be honest. There are few options:

  1. Leave things as they are
  2. making the asciidoctor output subject to validation, that is, before applying text2confl tooling/templates for conversion to Confluence format. This prevents masking/fixing bugs that may occur and should be fixed in text2confl and/or it's templates. If it's not valid XML, one could emit a warning (blaming the asciidoctor output, which is the fault of the input or of a used extension).
  3. As in 2, but also fixing the problem with JSoup, to be more tolerant to the input.
  4. making the Confluence XML (output of the entire text2confl tooling) subject to validation and/or JSoup-fixing. I don't think this would be great, though, as it would mask problems outside of asciidoc, that are in our power to fix.

I think (if it's easy) that solution 3 may make sense.

I don't consider this high-prio, but it might prevent some frustrations for the user.

zeldigas commented 1 week ago

Such issue can happen with markdown too. I think auto-fix can be handled for every page and controlled via some configuration option/cli flag. It will be off by default, but opted in if you prefer some result over page correctness