Is it ever possible to omit `TEMPLATE_ID`

writer / replaCy

spaCy match and replace, maintaining conjugation

https://pypi.org/project/replacy/

MIT License

34 stars 8 forks source link

Is it ever possible to omit `TEMPLATE_ID` #2

Closed sam-writer closed 4 years ago

sam-writer commented 4 years ago

For example, in

    "extract-revenge": {
        "patterns": [
                {
                    "LEMMA": "extract", "TEMPLATE_ID": 1
                }
        ],
        "suggestions": [
            [
                {
                    "TEXT": "exact", "FROM_TEMPLATE_ID": 1
                }
            ]
        ],

it seems like TEMPLATE_ID and FROM_TEMPLATE_ID could be inferred.

sam-writer commented 4 years ago

The answer is YES... We should add to docs!

EDIT - this is wrong, but I am leaving because for now, this discussion is good documentation

melisa-writer commented 4 years ago

The answer is YES... We should add to docs!

NO, why? You can have a multi token suggestion, lemmas pos and orth completely not related, verbs not supported by pyinlect. Why?

sam-writer commented 4 years ago

The answer is YES... We should add to docs!

NO, why? You can have a multi token suggestion, lemmas pos and orth completely not related, verbs not supported by pyinlect. Why?

I'm sorry, I don't understand.

melisa-writer commented 4 years ago

Ok, so let's start from definitions ;) What do you mean by inferring? I assumed it means: automatically assign.

sam-writer commented 4 years ago

In the original question, I mean: can one have a spacy_matches.json entry that does not specify TEMPLATE_ID?

And on Slack, it seemed like the answer was sometimes YES

melisa-writer commented 4 years ago

Yes, TEMPLATE_ID and FROM_TEMPLATE_ID are optional keys.

sam-writer commented 4 years ago

Yes, TEMPLATE_ID and FROM_TEMPLATE_ID are optional keys.

But in the particular example I picked, they are not optional?

melisa-writer commented 4 years ago

Not optional. If we want to omit TEMPLATE_ID we should change LEMMA to LOWER in patterns. This means we catch only extract and replace it without inflection by exact (and this replacement is correct, although we are missing extracts revenge etc.)

So use TEMPLATE_ID + LEMMA only if you wish to inflect suggestions.

sam-writer commented 4 years ago

Not optional. If we want to omit TEMPLATE_ID we should change LEMMA to LOWER in patterns. This means we catch only extract and replace it without inflection by exact (and this replacement is correct, although we are missing extracts revenge etc.)

So use TEMPLATE_ID + LEMMA only if you wish to inflect suggestions.

Ok great, this is what I was trying to get at. It is not a priori obvious, but that is an easy to describe rule... it could even be in a JSON-schema file (which is an addition I want to make - the option to validate a spacy_matches.json file).

I'm sure when you said

lemmas pos and orth completely not related, verbs not supported by pyinlect.

it answered this, but I still don't get it, so I will ask another way: when there is a single-token LEMMA pattern, and a single-token suggestion, why can't we infer the template?

melisa-writer commented 4 years ago

examples:

you want to correct the tense, preceding and succeeding words are given by match hooks
you don't want to copy plural/singular form (nouns are also inflected),
word in the pattern might not even exist in the dictionary so there are no inflected forms
you exchange different pos ..... or its just any other random replacement. A lot can happen!

melisa-writer commented 4 years ago

BUT if we really want to speed up labeling matches, we could display such guesses as a default (dashboard view). A user introducing the rule could correct it if it's wrong. I agree that would help. :rocket: