Closed RieksJ closed 8 months ago
By refactoring the part of the MRGT that handles the terminology under construction, the formphrases are now also checked when term
or -term
is used. It isn't handled quite the same as showtext
(yet), in the sense that the values are not converted to lowercase, '()
aren't removed and remaining strange characters aren't replaces by dashes. This can, of course, still be done with somethinking like the following.
if (key === "term") {
values = values.map((value) =>
value
.toLowerCase()
.replace(/['()]+/g, "")
.replace(/[^a-z0-9_-]+/g, "-")
)
}
Time is a miraculous thing - it has the ability to change one's mind.
When I read @Ca5e 's comment here above, it occurred to me that there is no real need for this, as curators that wanted to have added whatever is necessary to accommodate for the term manage
might have considered to use formPhrases[manage]
rather than term[manage]
, which would work if the MRG would ensure that the formPhrases
field of an MRG Entry would not contain macro's (as is currently specified) and the value of the term
field would always be included in the formPhrases
field (even if the formPhrases
field wasn't specified in the curated text).
From a usability perspective, I can see that what this issue is doing might be preferable. But then, there is also a case to be made that curators should know what they are doing and by changing some examples that help to point out this difference might just do it.
Currently, within the contents of the formPhrases field (in the header of a curated text), they are described as a comma-separated list of such form phrases. As we will be using the formPhrases field within the TRRT and MRGT differently from now on, it may be wise to start using the yaml convention for a list. This ensures correct handling in the future as it can be interpreted as a list without first having to verify it's type and possibly having to split a string on a comma.
So
formPhrases: actors, actor's, actor(s)
should become
formPhrases: [actors, actor's, actor(s)]
with the following, of course, also being valid.
formPhrases:
- actors
- actor's
- actor(s)
I'd propose the same thing for the grouptags
field, but this already seems to be the case according to the example listed in the specs here. It just doesn't seem to be honored within (some of) the curated texts (for example ict.md). This also means that existing references to grouptags
, if any, may not work as expected right now. As tools, quality-assurance
, for example, would be interpreted as one list item instead of two.
There is a problem here. I have tried to do that, but Docusaurus gives an error:
The error is due to the fact that {
and }
are used as the formphrase-macro delimiters.
@Ca5e How do you suggest we solve this? Would it help to surround list-elements with quotes, as in
`formPhrases: { "element{ss}" }
You're right. The following syntaxes should all be correct however.
formPhrases:
- actor{ss}
formPhrases: ['actor{ss}']
formPhrases: ["actor{ss}"]
formPhrases: - actor{ss} formPhrases: ['actor{ss}'] formPhrases: ["actor{ss}"]
I've changed the formPhrases in the essiflab/framework and tno-terminology-design/tev2-specifications repo to match the above formats, the MRG's will default to using the first example format.
On another note, I believe we could simplify/reduce the MRG's a bit. I believe the term(id) can only contain alphanumeric characters. In this case, there is no reason to list those characters within the formPhrases. Unless we want to keep this from a user experience point of view.
- "author"
- "authors"
- "author's"
- "author(s)"
May very well become
- "author"
- "authors"
This other note is something I don't quite oversee (yet?). Currently, showtext-matching to formPhrases is straightforward: the showtext as it is must match one of the formphrases. This is easy enough to explain and to grasp.
If we were to go with the proposal, that would introduce a new conversion step wehre the showtext is first modified before it is being matched. That needs to be explained, and introduces a complication to users. Also, a proper specification (and explaination) has to exist before we can start pondering about any (un)wanted consequences, which is a prerequisite for deciding whether or not to do this.
My first impression is that this is not worth it, but I am open to being convinced...
I've given this some thought.
Here are some observations:
C++
, B+ tree
, H2O
, 4G network
, file_path
, user_name
etc.term
, or if it does not exist, showtext
. This processing aims to turn that ncg into a text that is being looked-up in the formPhrases
arrays of mrg-entries. Note that this process identifies an MRG entry, not the term (at least not directly)From this, I conclude that it is better to think of
formphrase
s as the set of (lowercased, human readable) texts that are used by authors to refer to the semantic unit (concept) as documented by an MRG entryterm
s as the single text that satisfies regex [a-z_0-9-]+
that - in combination with termType
s - are used by machines to refer to the semantic unit documented by an MRG entry.This has the following consequences:
term
or showtext
), converts it to lowercase and trims whitespace off its edges, and then matches it with formphrases. Special characters remain as they are (so we're no longer going for the 'markdown-like-heading-ids'. This provides users with the maximum flexibility in referring to semantic units.formphrase
is simply a (lowercased, spaces-trimmed) character string that is an element in the formPhrases
field of an MRG entry. The formphrases fields in the tev2-specifications repo and the essif-lab framework repo need to be revised (removing many of the -
characters)@Ca5e : please comment if you think this poses problems
When the versions of the TRRT and HRGT are updated, they will use the new formPhrase behaviour. The tev2-specifications and framework repo still need to be updated.
We have introduced 'formphrases' as regular authorizable texts with spaces, special chars etc., and 'regularized texts' (uincluding regularized formphrases) as texts that don't contain such characters (see definition of 'regularized text', also for the conversion process.
term
in a termref assumes the showtext is a formphrase, then regularizes it, and compares it with the formPhrases fields in the MRGentriesterm
field in a termref is similarly treated as if it were a formphrase before using it to find an MRGentryterm
field is something like regex (?<term>[^@\)]*?)@
Termselection is also done by the termselection criteria of the importer. I can see the benefit of allowing curators to add terms by using texts that are treated as showtexts. However, I also think that we shouldn't make exceptions to the simple syntax we currently have, e.g., by treating the term...
instruction differently than we would, say, glossaryText
.
What do you think about adding an instruction ADD
(possibly also REMOVE
) that would take a list of strings as argument, and treat it as a showtext for finding an MRG entry to be imported?
Decision: if key
isn't specified in an instruction, then the argument list is considered to be a list of showtexts.
Thus, we can say, e.g.,
[ "showtext 1", "showtext 2"]@<tid>
which will ADD every curated text or MRG entry (depending on <tid>
) that has the regularized versions of either showtext 1
and/or showtext 2
in its formPhrases
field.-[ "showtext 1", "showtext 2"]@<tid>
which will REMOVE every MRG entry that has the regularized versions of either showtext 1
and/or showtext 2
in its formPhrases
field.Functionality of termselection adding and removing without specifying a key has been added to MRGT v1.0.4. There are still some TBD's related to this at the end of the specs here. I suppose we can leave this issue open until those have been taken care of.
The syntax [ "showtext1", "showtet 2"]@<tid>
is illegal YAML. VSCode says YAML syntax error Unexpected scalar at node end
. That is because of the @<tid>
. When using MRGT 1.0.4., it says:
So we need another syntax for this.
Proper YAML syntax is to be used, i.e. stuff should be surrounded with quotes
key
specified)
When selecting terms from eSSIF-Lab for the TEv2-documentation terminology, there is a termselection line
The term
manage
doesn't exist in the essif-lab terminology. However, there is a curated text in eSSIF-Lab that has:When working with these terms in practice, authors/curators would use all these forms, and would not necessarily know which is the one that is actually defined. Requiring curators to go look for that is perhaps a bit overkill.
This issue calls for an enhancement, where
term
(as well as-term
) are treated differently from other fields, in that its value(s) should be treated asshowtext
(s), in the same way as done in termrefs, and the output of converting a showtext in a term would then be taken as the value to add or remove from the mrg-under-construction