w3c / i18n-glossary

Definitions of terms used in W3C Internationalization documents.
https://w3c.github.io/i18n-glossary/
5 stars 4 forks source link

Where does the definition of natural language draw the line between natural languages and other human languages? #16

Open cstrobbe opened 1 year ago

cstrobbe commented 1 year ago

The definition of natural language currently reads,

Natural Language (sometimes just language) refers to the spoken, written, or signed communications used by human beings. (...)

What criteria are silently assumed to differentiate between natural languages and other human languages?

  1. Languages that evolved versus languages that were constructed for communication between people? This would exclude languages such as Esperanto, Ido and Klingon.
  2. Languages that people learn as a native language versus ones that aren't? ("Natural language" is sometimes defined as a language learnt as a native language.) This would seem to include Esperanto (which is said to have native speakers) but would exclude Latin, Homeric Greek and a number of languages that have recently become extinct.

The ISO language tags don't exclude ancient languages, extinct languages or constructed languages.

In 2008, WCAG 2.0 intentionally used the term human language instead of "natural language" in order to avoid using a term that might be interpreted as excluding constructed languages such as Esperanto and extinct or historical languages such as Latin. Content in these languages exists online and can be identified using ISO 639 language tags.

aphillips commented 1 year ago

What criteria are silently assumed to differentiate between natural languages and other human languages?

There are no such criteria because there is no "other" here. While some dictionary definitions prefer a narrow definition of "natural" (as a contrast with "artificial" languages such as Klingon), in computing the term primarily means "human language" through its association with "natural language processing". The contrast here is with machine languages.

We could add "human language" as an additional term here. Note that the definition here is drawn from and is meant to reflect those found in BCP47 (which is our preferred reference for language identification) such as:

Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages but excludes languages not intended primarily for human communication, such as programming languages.