metanorma / metanorma-plugin-glossarist

Glossarist plugin for Metanorma
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Concepts not linking to references #27

Open opoudjis opened 2 months ago

opoudjis commented 2 months ago

I am trying to generate the glossarist termbase for https://github.com/geolexica/isotc204-glossary, in pursuit of https://github.com/metanorma/metanorma/issues/75

= X
:doctype: standard
:glossarist-dataset: dataset:../isotc204-glossary/concepts

== Terms and definitions

glossarist::import[dataset]

Unlike ISO TC 211, this is working, but there are two issues:

  1. the clause depth is wrong: the screen fills with log messages reading: asciidoctor: WARNING: <stdin>: line 6184: section title out of sequence: expected level 2, got level 3
  2. the cross-references are not working: I'm getting a heap of AsciiDoc Input: (ID _72c95c29-2ee2-4073-96d0-7927d5d1e8c2): Error: Term reference tourn_iso_std_iso_14812_3.1.1.1missing: "urn_iso_std_iso_14812_3.1.1.1" is not defined in document

The latter are turning up in the generated document, since they really are unresolved references:

Screenshot 2024-06-29 at 00 20 31

Metanorma sees that this crossreference is meant to be to the term "ITS component", but it is not finding the term ID urn:iso:std:iso:14812:3.1.8.2, so it cannot confirm that crossreference, let alone hyperlink to it.

I am reasonably concerned that there will be errors in the markup rendered by the metanorma plugin, that I will need to diagnose. In order to be able to do so, I am doing something I've already had to do for metanorma-plugin-lutaml: I am going to output a log of what Liquid has generated after the templates have been populated, so that I know what has actually happened.

opoudjis commented 2 months ago

... And here it is:

==== entity

concrete or abstract thing that exists, did exist, or can possibly exist, including associations among these things

[example]
{{urn:iso:std:iso:14812:3.1.1.6,person,Person}}, object, event, idea, process, etc.

[.source]
<<ISO_TS_14812_2022,3.1.1.1>>

See https://www.metanorma.org/author/topics/sections/concepts/#reference-by-anchor

The {{...}} takes either {{identifier}}, or {{identifier,display text}}, or {{identifier,canonical text,display text}}.

But this plugin is not inserting the identifier into the output at all! There IS no anchor corresponding to urn:iso:std:iso:14812:3.1.1.6 in the output.

Therefore you need to either:

[[urn:iso:std:iso:14812:3.1.1.1]]
==== entity

and, as required in the documentation,

{{<<urn:iso:std:iso:14812:3.1.1.6>>,person,Person}}

Of course, : is an illegal character in XML anchors, and I will be emending it to the legal _. I guess we'll find out if Metanorma will cope with this in both anchor and cross-reference.

This gem is going to unearth a lot of problems like this, and there will be a lot of RTFM invoked. This is testing that should have happened when this was implemented: outputting Asciidoc, without checking if that Asciidoc makes sense, is fundamental ntegration testing.

opoudjis commented 2 months ago

The bad clause depth is also clear:

== Terms and definitions // comes from me

==== entity // comes from metanorma-plugin-glossarist

The Asciidoctor plugin needs to know what the current clause depth is, and make the clauses it creates one deeper. Hard coding the depth of the clause is an unacceptable bug.

The preprocessor can work this out by tracking the clause titles it finds as it processes the entire document, and keeping track of the current clause depth. Asciidoctor being asciidoctor, it also needs to be aware of preformatted blocks and the use of ==== in examples. So you're looking to do something like this

using pass_status() and related code from https://github.com/metanorma/metanorma-standoc/blob/main/lib/metanorma/standoc/macros.rb

p = init
 loop do
            current_line = input_lines.next
            break if end_mark && current_line == end_mark

            p = pass_status(p, current_line.rstrip)            
            if(!p[:pass])
              /^==+ \S/.match?(current_line) and title_depth = current_line.sub(/ .*$/).size
              process_line(document, input_lines, current_line, liquid_doc, title_depth)

            end
          end

where any clauses introduced by processing within process_line are going to be prefixed by "=" * (title_depth+1 ), not ====. In the default case, title_depth is going to be 2.

opoudjis commented 2 months ago

@ronaldtse Do not even think of mentioning coradoc here. metanorma-plugin-* are doing line-based processing, as Asciidoc preprocessors.

HassanAkbar commented 4 weeks ago

@opoudjis

  1. the clause depth is wrong: the screen fills with log messages reading: asciidoctor: WARNING: : line 6184: section title out of sequence: expected level 2, got level 3

This will be fixed by PR https://github.com/metanorma/metanorma-plugin-glossarist/pull/34

  1. the cross-references are not working: I'm getting a heap of AsciiDoc Input: (ID _72c95c29-2ee2-4073-96d0-7927d5d1e8c2): Error: Term reference to urn_iso_std_iso_14812_3.1.1.1 missing: "urn_iso_std_iso_14812_3.1.1.1" is not defined in document

This is happening because the references in the datasource are in the format {{urn:iso:std:iso:14812:concept.id,text}} for example -> concept-3.1.1.1, Line #14.

I think the way to handle this is to update the references in datasource to {{<<urn:iso:std:iso:14812:concept.id>>,text}} and then in the metanorma-plugin-glossarist we should generate anchors for every term which will be the concept id by default and then add an option to add a prefix for anchor, i.e

:glossarist-dataset: dataset1:./spec/fixtures/dataset-glossarist-v2

== Render Section
glossarist::import[dataset,anchor-prefix=urn:iso:std:iso:14812:]

This should render concepts with anchor prefixed by the given anchor-prefix i.e [[anchor-prefix concept_id]] and by default the anchor would be the id of the concept.

@ronaldtse @opoudjis Am I going in the right direction or should we follow a different approach?

opoudjis commented 3 weeks ago

I think you are indeed doing the right thing there, @HassanAkbar . If cross-references presuppose urn:iso:std:iso:14812: in front of IDs, then you are going to have to put that urn:iso:std:iso:14812: in front of the IDs.

There is an alternative which is the same amount of work for you, more work for me, slightly more semantically correct, but ends up with the same result: treating these not as anchors but as termbase identifiers, per https://www.metanorma.org/author/topics/sections/concepts/#concepts-from-external-termbase . So, patterned after {{<<IEV:171-05-02>>,immature kernel}}, where IEV is the termbase and 171-05-02 is the term id, we could somehow do {{database:urn:iso:std:iso:14812,text}}.

But {{<<IEV:171-05-02>>,immature kernel}} fetches 171-05-02 from the Electropedia; {{database:urn:iso:std:iso:14812,text}} would have to do a whole lot of somersaults to do the same with glossarist input, which is already converted to Asciidoctor anyway, for purely theoretical benefit. I don't wish to do it, and we would end up having to add the identifier to Metanorma XML anyway, only to then map it back to an anchor to resolve the cross-references:

== term

termbase-id:[local,urn:iso:std:iso:14812:concept.id]

...

{{urn:iso:std:iso:14812:concept.id}}

{{term}}

And then implementing code to render termbase-id:[] as <term><identifier>...</identifier>, and to look up term/identifier as well as term/preferred when resolving {{...}}

It's pointless, and arguably an abuse of the notion of termbases to begin with: the point of the glossarist plugin is that this is no longer an external termbase, it is now part of the current document.

The only problem with anchors is that : is a forbidden character, so it is going to be converted to _, which makes such IDs somewhat more ambiguous. But that's an edge case.

Keep doing what you've proposed.