jdee / dubsar

Dubsar Dictionary Project
https://dubsar-dictionary.com
15 stars 0 forks source link

invalid inflections #8

Open jdee opened 14 years ago

jdee commented 14 years ago

The ActiveSupport::Inflector was used for regular plural nouns. The WordNet(R) exception list provided irregular plurals. For any other one-word noun, the ActiveSupport String#pluralize method was used. Generally this produced correct results, but often not. For example, the plural of Man (n.) (the island) is listed as Men, and the plural of shaman (n.) comes up shamen.

Some compound verbs are not handled correctly, notably log-in (v.), which Dubsar currently conjugates log-ins, log-ined, log-ining.

Dubsar provides no regular inflections for adjectives because rules for comparative and superlative degrees produce forms like sabbaticaller and sabbaticallest.

The :inflections table will simply grow to include more and more exceptions until all cases are listed, and there is no longer any need to generate it from rules or WordNet exception files. It will just be dumped out and then reloaded on each seed.

jdee commented 14 years ago

There are also some stragglers like "tiing" for tie (in addition to "tying").

jdee commented 14 years ago

The invalid -iing endings (tiing, diing instead of tying, dying) have been removed by a reseed. The code in the Word model that filters out duplicate inflections has been improved to work appropriately during creation, before the inflections have been saved. Now the de-dupe step at the end is no longer necessary.

jdee commented 14 years ago

A number of problems have been solved. In a reseed, no regular inflection will be attempted for any word that contains anything but lower-case letters (no capitals, digits, spaces, hyphens or other punctuation).

Meanwhile, there continue to be problems in general with verbs ending in -CVc (e.g., bivouac, picnic). These will be addressed soon.

jdee commented 14 years ago

That last batch has been addressed. Dubsar no longer attempts to inflect anything that doesn't match /^[a-z]+$/, i.e., nothing capitalized, nothing containing spaces, hyphens or other punctuation. The main remaining issues are with verbs ending in a short syllable with -l or -s, where there are often ambiguities (like traveled and travelled). Dubsar usually provides both, sometimes erroneously.

jdee commented 13 years ago

Visit is erroneously listed with inflections "visitting" and "visitted." Same problem for "audit."

jdee commented 13 years ago

The -it verbs have been fixed with a reseed.

jdee commented 13 years ago

A couple of recent problems to be addressed:

cattle pluralized as cattles: This sort of word the ActiveSupport::Inflector calls "uncountable." In grammatical terms, they're perhaps indeclinable. At any rate, legitimate plurals of this form include monies, peoples, waters. The distinction has to be handled on a case-by-case basis.

hurted: WordNet does not treat this as an irregular, so I'll have to.

jdee commented 13 years ago

The problem with hurted has been corrected with a migration. The cattles problem is a more general issue with the ActiveSupport Inflector and needs a little more general treatment.