Open RieksJ opened 8 months ago
Decisions:
@Ca5e:
macros
key to the configuration file documentation.@RieksJ, please check the documentation.
@Ca5e Can you have a look at the specification of form phrase macro maps, and particularly the section on how they work.
If you are convinced the specifications and the operation of the tools agree, you may close this issue. If not, please comment what the (remaining) issues are.
Some things I believe should be looked into...
"Form phrases are used to refer to a particular semantic unit as known in a particular terminology."
I'd say form phrases aren't used to refer to a semantic unit, but instead enable a semantic unit to be referred to.
Here is how a form phrase is matched against:
Considering we're using termid
to match where possible, I believe this section should be rethought. Within the MRGT there isn't much of a difference between searching in curated text or MRGs either. When the tool first recognizes that the curated texts are supposed to be used, it loads all of the curated texts as a 'normal' list of MRG entries.
macros:
should actually be (remove dashes that make the dictionary a list, change comment format)
``` yaml
macros:
"{ss}": ["", "s", "'s", "(s)"], # "act{ss}" --> "act", "acts", "act's", "act(s)"
"{ess}": ["", "es", "'s", "(es)"], # "regex{es}" --> "regex", "regexes", "regex's", "regex(es"
"{yies}": ["y", "y's", "ies"], # "part{yies}" --> "party", "party's", "parties"
"{ying}": ["y", "ying", "ies", "ied"], # "identif{ying}" --> "identify", "identifying", "identifies", "identified"
"{es}": ["e", "es", "ed", "ing"], # "mangag{es}" --> "manage", "manages", "managed", "managing"
"{able}": ["able", "ability"] # "cap{able}" --> "capable", "capability"
I have however changed the interpreting code so the format that does use the dashes is also supported in the next release.
@Ca5e Thanks for all the comments, which I have used to improve the documentation.
And, Yes, please move back so we can combine these sources.
In order to make formphrase macros also useable when terminologies are developed in different languages, it is necessary that they can be specified outside of the source code of the tools. Also, if a curator wants to adjust the macro's, (s)he can then do so. It is also handy for testing new regex candidates.
This issue calls for:
For starters of the specifications, I think the macros should either be specified in a (new) section of the SAF (that doesn't get copied into MRGs), or we could make it a command-line option for the MRGT (so that it can also be listed in the MRGT configuration file).