mquinson / po4a

Maintain the translations of your documentation with ease (PO for anything)
http://po4a.org/
GNU General Public License v2.0
121 stars 58 forks source link

AsciiDoc: Drop last character of table entries #386

Open mquinson opened 1 year ago

mquinson commented 1 year ago

Hello,

a new bug was reported against the Debian package, here: https://bugs.debian.org/1019548

@jnavila, do you have some time to look at it? I'm not sure when I'll find some time to devote to po4a, but this will happen. It always happen after a while :)

Thanks for your help, Mt

jnavila commented 1 year ago

This is related to how cells lines are split into cells. The regexp for the separator include "e":

https://github.com/mquinson/po4a/blob/8279b57c1d56c4afdb04f551b7e0a55acddf2069/lib/Locale/Po4a/AsciiDoc.pm#L463

I'm not sure we can use the split trick if we want to fix this.

I would argue that the formatting in the bug report is not proper Asciidoc: https://docs.asciidoctor.org/asciidoc/latest/tables/add-cells-and-rows/ stipulates that the cell separator is prefixed with a space character.

mquinson commented 1 year ago

I'd prefer to save the split trick if possible. That's really convenient.

Is it possible to at least detect that the input is malformated and report it as an error to the users?

smoe commented 1 year ago

I just learned about table style operators. Many thanks for that ;) On https://docs.asciidoctor.org/asciidoc/latest/tables/format-cell-content/ all these operators are at the beginning of a line. Would that help in some way ro keep the split? Have not seen any formal grammar anywhere, but the least I would expect is some whitespace to preceed the operator.

jnavila commented 1 year ago

We can introduce the need of a beginning of line or space as prefix, but this is a change of behavior.

jnavila commented 1 year ago

@smoe The first paragraph of https://docs.asciidoctor.org/asciidoc/latest/tables/add-cells-and-rows/#table-cells is quite clear:

Each new cell in a table is declared with a cell separator. The default cell separator is a vertical bar (|). All of the content entered after a cell separator is included in that cell until the processor encounters a space followed by another vertical bar (|) or a new line that begins with a |.

The cell specifiers are place directly in front of the cell separator, that is after a space or a new line.

The po4a parser for cells can work with well formatted cells, but will not behave correctly if the whitespace is missing.

smoe commented 1 year ago

You are correct.

mquinson commented 1 year ago

I think we should at least produce a warning when there is no space involved in the separator, saying that something may go wrong. Don't you think so?