Asciidoctor/Include issue, when using Hugo shortcode

dbaio commented 2 years ago

Source file. PO file. po4a version 0.66 and prior releases has the same issue.

po4a-gettextize \
        --format asciidoc \
        --option compat=asciidoctor \
        --option yfm_keys=title,part,description \
        --master "po4a-asciidoctor-includes.adoc" \
        --master-charset "UTF-8" \
        --copyright-holder "The FreeBSD Project" \
        --package-name "FreeBSD Documentation" \
        --po "po4a-asciidoctor-includes.po"

-->

#. type: Plain text
#: po4a-asciidoctor-includes.adoc:35
msgid ""
"include::shared/attributes/attributes-{{% lang %}}.adoc[] "
"include::shared/{{% lang %}}/teams.adoc[] include::shared/{{% lang "
"%}}/mailing-lists.adoc[] include::shared/{{% lang %}}/urls.adoc[]"
msgstr ""

Includes without {{% lang %}} works fine; they are skipped.

I'm reporting this so others can help, but the plan is to dig/debug po4a later.

https://docs.asciidoctor.org/asciidoc/latest/directives/include/

jnavila commented 2 years ago

Ah, yes... Variables in include statements are not supported. Macros are matched with this regex:

https://github.com/mquinson/po4a/blob/49eaae237a1207b2f137f5c7039289249c9afbb8/lib/Locale/Po4a/AsciiDoc.pm#L792-L793

jnavila commented 2 years ago

In fact, it should work if you remove the space characters around lang. Managing these ones may be quite tricky...

dbaio commented 2 years ago

In fact, it should work if you remove the space characters around lang. Managing these ones may be quite tricky...

I'll try this, thanks

dbaio commented 2 years ago

This {{% lang %}} is a Hugo shortcode in the Asciidoctor include.

https://github.com/mquinson/po4a/blob/master/lib/Locale/Po4a/AsciiDoc.pm#L793

It seems that changing the macro regex to accept any character before [] also fixes this and does not break anything.

and ( $line =~ m/^([\w\d][\w\d-]*)(::)(\S*)\[(.*)\]$/ ) )

to

and ( $line =~ m/^([\w\d][\w\d-]*)(::)(.*)\[(.*)\]$/ ) )

I tested it with all includes from the Asciidoctor Documentation examples and others I could find.

What do you think about this change?

jnavila commented 2 years ago

I have two oppositions:

I do not like this catch all regex. I'm not sure it could start to wrongly match description lists . To rule out this mismatch, you should match a non-space character just after the colons.
More generally, what we are trying to do here is to match some part of a format of an external templating engine (Hugo). There are plenty of templating engines out there, and we surely don't want to tweak our hand-made parser to accept all their formats. In the end, our parser should be used on the same content as asciidoc, that is the output of the templating engine.

Our parser must be facing plain asciidoc for translatable content. This is a task of internationalizing the content to use the features of asciidoc in order to split the content between template specific code and plain asciidoc. You could use asciidoc document attributes to do so:

:hugolang: {{% lang %}}

include::shared/attributes/attributes-{hugolang}.adoc[]

In fact the lang attribute is already in use in asciidoctor, in order to select the language for naming figure, chapters,... Also note that document attributes can be passed to asciidoctor at invocation time, on the command line with the -a option or in code, thus eliminating the need to define them in the document and run the templating engine on them, which may make the whole processing lighter.

jnavila commented 2 years ago

@mquinson What do you think?

mquinson commented 2 years ago

I never wrote one line of asciidoc in my life and would prefer if I can blindly trust someone here...

If someone like @dbaio needs this asciidoc+hugo thing and is willing to contribute to it, why not? Hugo may be one of the templating solution among the mass, but it's not the less used one either. But again, if there is no user to contribute to that code, it may not happen (no matter how desirable): that's free software and we mostly fix the code we use and are familiar with. Please @dbaio jump into the code :)

There is a slight challenge to ensure that the new features do not clutter too much the code, but I'm not familiar with this code at all, so it's hard for me to comment.

If things go seriously wrong, you should consider doing a specific po4a formatter for asciidoc+hugo, alongside to the original formatter (starting by copying the file). But that should only be done in last resort, as the maintainance of the asciidoc parts will be doubled if you dupplicate the code. This is really not a good design for the long term, but that's maybe something to consider if code reusability gets hard. We already have 2 YAML parsers (one for the stand alone formatter and one for the front matter) and the main reason I think it's OK so is because this code is so simple and small.

dbaio commented 2 years ago

The lang attribute (with -a option) is an issue with Hugo and Asciidoctor; we can't define it for each language when building the project. I tried to talk with Hugo developers about that in the past; anyway, I'll send more messages in their forum about it again.

The example with asciidoc document attributes can be a way out as well, thanks for the tip, although, in our project, we will need to change more than a thousand files to use it.

dbaio commented 2 years ago

About the catch-all regex, we can work around it with (\S*|\S*\{.*\}\S*).

I've opened PR #355 for review.

jnavila commented 2 years ago

The lang attribute (with -a option) is an issue with Hugo and Asciidoctor; we can't define it for each language when building the project. I tried to talk with Hugo developers about that in the past; anyway, I'll send more messages in their forum about it again.

I don't understand this remark. From a quick review of your project, you call ./tools/asciidoctor.sh books ${_lang} pdf for each language, as detected in the Makefile ; the correct language is already passed to asciidoctor with -a lang="$doc_lang", making the attribute already available from asciidoc.

I am not minding my own business, but it seems that Hugo is useless for your usecase, as you already have all the needed transclusion facility in asciidoctor.

jnavila commented 2 years ago

As for thousands of files, well it's just a simple sed command.

dbaio commented 2 years ago

The FreeBSD docs have two Hugo projects on it, website and documentation, and they use Asciidoctor, but it's all driven by Hugo. That script generates a pdf for the documentation articles/books; it's the only place we use asciidoctor standalone.

While that issue with Hugo and Asciidoctor exists, we will need somewhere an if statement to change something between them.

IMHO, this change won't harm po4a and the asciidoc format, but I will respect it if you don't want to mix Hugo format here.

mquinson commented 2 years ago

I tend to agree with the fact that we can support Hugo without cluttering too much the code (or at least I hope so), but I didn't read the code yet. I think that at the end of the day, that's @jnavila decision. He's the one who did most of the work on the Asciidoc formatter, so he decides.

If we need to split the implementations, I'm confident that the BSD community will manage to maintain a fork of that formatter.

Again, I'm not saying that we must fork the formatter, because I didn't read the patch. And I prefer if I don't have to, so that the bus factor of that project continues to grow :)

Thanks for your time, guys.

mquinson / po4a

Asciidoctor/Include issue, when using Hugo shortcode #352