jpvanhal / inflection

A port of Ruby on Rails' inflector to Python
https://inflection.readthedocs.io
MIT License
493 stars 61 forks source link

pluralize('datum') -> data but I wish 'datums' #32

Open pbrod opened 5 years ago

pbrod commented 5 years ago
>>> import inflection
>>> inflection.pluralize('datum')
'data'

>>> inflection.pluralize(inflection.singularize('datum'))
'data'

>>> inflection.singularize('datum')
'datum'
Rocamonde commented 5 years ago

Yes, but that is wrong.

pbrod commented 5 years ago

Well that is debatable according to https://nxg.me.uk/note/2005/singular-data/ and https://en.wikipedia.org/wiki/Data_(word).

In precise geodesy, for example, a ‘datum’ is the term for one of several models of the shape of the earth, relative to which the heights of mountains and the positions of telescopes are measured. This usage, which has nothing to do with our atom of data, has the perfectly regular plural ‘datums’.

pbrod commented 5 years ago

According to https://en.wikipedia.org/wiki/Data_(word) data is most often used as a singular mass noun in everyday usage. Some use it either in the singular or plural. The Associated Press style guide classifies data as a collective noun that takes the singular when treated as a unit but the plural when referring to individual items (e.g., "The data is sound" and "The data have been carefully collected").

In scientific writing data is often treated as a plural, as in "These data do not support the conclusions", but the word is also used as a singular mass entity like information, for instance in computing and related disciplines. British usage now widely accepts treating data as singular in standard English, including everyday newspaper usage.

Rocamonde commented 5 years ago

I agree all your comments, except for the one mentioning the original topic, i.e “datums”, as I never heard of it, so I can’t really speak of it. To my understanding, the right way of pluralising such latin terms is as “datum”-“data”, just like “erratum”-“errata”, or many others. If you can provide a reference where that word can be found, I’d appreciate it.

Regardless, the current implementation only supports one resultant term, so the default one should be, in my opinion, “data”. Figuring out what word is appropriate would require context processing and probably AI-like tools, which are, to my understanding, outside of the scope of this package.

pbrod commented 5 years ago

My point is that "data" as a singular form is far more common according to https://en.wikipedia.org/wiki/Data_(word) than the latin word "datum". In English, the word datum is still used in the general sense of "an item given", but is now-rarely-used. Any measurement or result is a datum, though data point is now far more common.

If you're writing for an academic audience, particularly in the sciences, "data" takes a plural verb. For example: "The data are correct". But most people treat 'data' as a singular noun, especially when talking about computers etc. For example: "The data is being transferred from my computer to yours".

And I have to be honest, I've never heard anyone ask for a datum.

I think it is more usual to talk about 'datum' in geodesy. And in this context the plural is ‘datums’.

That is why I think inflection.pluralize('datum') should return "datums", and that inflection.pluralize('data') and inflection.singularize('data') both should return "data".

I think this convention is more practical in use.