Open juditacs opened 2 years ago
It is so wonderful way to find a mistake, so should I skip all those entries with 'only 3rd-person forms' as shown in the below image?
By the way, I fully agree with you that the subjunctive forms should be removed (Wiktionarians may have different ideas on them, unfortunately, we may never know about that)
Yes, I think they should be skipped since they are since 1. they are not used, 2. the actual inflected form (if it exists at all, some don't) is not specified in Wiktionary.
Is there a Unimorph guideline for these cases? I doubt it only pertains to Hungarian.
I computed the character Jaccard similarity lemmas and inflected forms and I'm looking at the lowest values. Some descriptive verbs are only ever used in their 3rd person form and Wiktionary notes this as only "3rd-person forms". These are now parsed as
V;IND;PRS;INDF;1;SG
but they really should be skipped.Examples: https://en.wiktionary.org/wiki/havazik https://en.wiktionary.org/wiki/f%C3%A1j
I found another similar placeholder when I looked at the difference between the length of the lemma and the inflected word: "the verb has no subjunctive forms"
Examples: https://en.wiktionary.org/wiki/fejlik https://en.wiktionary.org/wiki/rejlik
Mentioned in https://github.com/unimorph/hun/issues/1