unimorph / pol

Polish
2 stars 0 forks source link

Noun gender in pol.zip #1

Open kylebgorman opened 1 month ago

kylebgorman commented 1 month ago

pol.zip, data from the online grammatical dictionary, includes in the 3rd column information about noun genders. This is not to spec. Polish nouns don't "decline for gender"; rather it is an inherent feature of the lexeme, and thus should not be present. During the discussion for the UniMorph 3 spec, I proposed that we include inherent lexical features in the fourth column, but I don't think this was ever put into action. I propose this just be placed in the fourth column (and removed from the 3rd).

@wkieras

wkieras commented 1 month ago

What about animacy for nouns and aspect for verbs?

kylebgorman commented 1 month ago

Animacy for nouns is exactly the same issue as gender for nouns. (There are of course a few nouns which have virile and inanimate versions; słoik comes to mind. But I wouldn't say nouns "decline for animacy", just that if you derive an animate from an inanimate noun, you inflect it slightly differently.)

I know a bit less about Polish verbs but I assume the system is like other Slavic languages. Most people would also, about Russian verbs for instance, say that aspect is inherent to a verb, because while there are sometimes perfective and imperfective variants of the same verb root, it is often hard to predict what prefix, suffix, or stem change will be used to generate the perfective, and there are many imperfectives without corresponding perspectives, or vice versa.

The way these features are written in basically fine, they just belong in a separate column.

wkieras commented 1 month ago

OK, I moved gender, animacy (nouns) and aspect (verbs) to the 4th column and also switched to xz compression. Please, let me know if this is OK, as I need to do the same for Czech, Slovak and Ukrainian data.

kylebgorman commented 1 month ago

LGTM all around.

I have one last thing for you while I have your attention; re: #2 there are an awful lot of feminines missing a gen.pl.; not sure if that's intentional or not.

wkieras commented 1 month ago

Can you give any examples? I looked for feminine nouns not containing GEN;PL form and I got only "krzta", which is defective.

kylebgorman commented 1 month ago

Can you give any examples? I looked for feminine nouns not containing GEN;PL form and I got only "krzta", which is defective.

This is an error on my part: my routine was expecting a unique gen.pl. and these words I'm seeing all have multiple gen.pl.s (e.g.: acerola).