Closed missinglink closed 4 years ago
How do pluralizers work for languages with gender-based rules, do they require the full dictionary with corresponding genders to be accurate?
I think, for French, I should add all plurals by hand... This lang is so... :sweat:
You are right, this is only for English (or language with English-like grammar).
I tried for the place bureau
, the plural is bureaux
(/eau$/ => 'eaux'), but for places like bureau de change
, the plural is on bureau
, so the result should be bureaux de change
(=>currency exchange)
var pluralize = require("pluralize")
pluralize.addPluralRule(/eau$/i, 'eaux')
pluralize('bureau') // bureaux OK
pluralize('bureau de change') // bureau de changes NOT OK
I also tried pluralize-fr
and french-words
but they are failing on bureaux de change
, they do a plural on each word (=>bureaux des changes
)...
IMO the safest way is adding all plural in the pelias dictionary place_names.txt
, or in a new file place_names.plural.txt
(at least for French).
I know we do not have enough knowledge in all languages to cover the world... :disappointed:
So maybe we can use this lib for English (after a review of generated places ?) but not for all languages :confused:
I wrote the inverse of this a few years ago to singularize words using English grammar rules: https://github.com/pelias/analysis/blob/master/test/tokenizer/singular.js
I can probably port these tests across and invert them
Added some tests via https://github.com/pelias/parser/pull/119/commits/111bee75caef77f1ff92bc8f956b9d23a31f8b10 including tests to ensure that the plurals are not generated for non-English tokens.
The npm pluralize
library is actually pretty good for English, there are unfortunately some ambiguous words such as "staff" which can pluralize to "staff" (for staff members) and "staves" (for a several walking/fighting sticks).
I'm happy to merge this as-is and work on adding other languages in subsequent PRs
This PRs adds a function which optionally pluralizes a dictionary of words. This is useful for cases like
Foo Hotels and Homes
where the terms would otherwise not be classified as 'place'.The library I chose is https://github.com/plurals/pluralize, mainly because it seems quite popular, I'm open to alternatives. I suspect one problem with selecting this library is that it's probably only using English word rules.
@Joxit maybe we can add French and others too?