tno-terminology-design / tev2-tools

The Terminology Engine (v2) is a set of specifications and tools that caters for the creation and maintenance (i.e. curation) of terminologies. This repository contains the sources for the tools.
Apache License 2.0
2 stars 3 forks source link

MRG Refs to include HRGT options (at least mutliple converters) #35

Closed RieksJ closed 7 months ago

RieksJ commented 9 months ago

HRGs are typically specified by an MRGRef. Currently, its syntax accepts only a limited set of configuration options.

This issue requests

Ca5e commented 8 months ago

This issue indeed motivates thinking about the use of the {% hrgt="<tid>" <other args> %} syntax to call other tools as well. In my opinion, the scanning of documents for inclusions of these specific syntaxes should not be part of the to-be-called tool, but instead be handled by some other tool that has calling other tools as its main task (I would say "calling other commands", but this seems very risky from a cross site scripting perspective). If we let the to-be-called-tool scan documents, I suspect a less ideal situation where all of the tools require a large amount of files to be processed.

About the use of 'other args'. This basically boils down to deciding at what level we want to strictly define the method used to interpret the syntax. Right now, only predefined named capturing groups of the HRGT interpreter are used. Within the {% %} syntax, I do like the html-esque way of recognizing the hrg, converter, and sorter properties. The use of named capturing groups sadly stops working when we allow converters to be specified with an unknown number n. I believe two solutions are at play. One involves using a syntax similar to {% hrgt="<tid>" cmd="--converter[2] <converter>" %}. The advantage here being that we can still use the same approach as before (using the regex to find named capturing groups), but add a named capturing group that is forced to behave in the same way as the command line interface (which does add quite a bit of complexity). The second solution, looks like the following {% hrgt tid="<tid>" converter[2]="<converter> "%}, which means forcing the use of html-esque parameter use. The main advantage being the more clear notation that does not mix different notation styles. This dicussion is caused by the fact that we don't have dynamic (named) capturing groups. A way to achieve this would be to artificialy expand the group nodes of the regex in the AST. For instance, using an interpreter regex like {%\s*hrgt\s*((?<key>.+)="(?<value>.+)" )\s*%} to only define the format, and not necessarily the names of the groups. Within the AST the unnamed capturing group that includes the two named capturing groups key and value would be repeated a large amount of times, where (in this case) all of the repeating unevenly numbered capturing groups are used as keys and all the evenly numbered capturing groups are used as value. This seems cool, should work in a lot of situations, but is (maybe too) difficult to explain.

RieksJ commented 7 months ago

I close this issue as (a) there is no problem statement/bug report that warrants the issue's existence, and (b) discussions about how to deal with MRGRefs are also considered in #25 (which for now is the appropriate place)