Localisation suggestions

romcal / romcal

JavaScript library that generates liturgical calendars of the Roman Rite of the Roman Catholic Church.

https://romcal.js.org

MIT License

92 stars 47 forks source link

Localisation suggestions #26

Open tukusejssirs opened 6 years ago

tukusejssirs commented 6 years ago

(0) Definitions

‘Main locale’ is one locale in every language. It functions as first fallback language and contains all keys. For example: enUs, deDe, esEs, frFr.

‘Sub-locale’ contains only those keys that differ in grammer or spelling from the sub-locale’s main locale.

‘Locale’ is any locale, i.e. either main or sub-locale.

(1) Naming convention of locale files

I think that we should make the naming of locales uniformic. Example:

enUs.mjs [lang][Country].mjs

The first part should be ISO 639-1 language code. The other one ISO 3166-1 alpha-2 country code (this site, though in Czech, contains more languages than its English counter-part, which does not contain Slovak).

In the end, the files should keep the lower-camel-case capitalisation, that is it should be like [a-z]{2}[A-Z][a-z]\.mjs.

Latin laguage, which is currently used as official language in Vatican City (Holy See; laVa.mjs) and Malta (laMt.mjs) only, I suggest that all latin locales would use laVa.mjs file.

TODO:

rename:
- cs.mjs to csCz.mjs,
- fr.mjs to frFr.mjs,
- pl.mjs to plPl.mjs;
consider removal of enBo.mjs, as English is not official/national language in Bolivia.
consider removal of enPh.mjs, as though English is used as official language in Philippines, it

Note to enBo.mjs: I understand that you just wanted to address the different names of the celebration, but still I think that should be done the way I suggested in issue #24. That way we would ensure that all strings (keys) in one locale (language) would reside in one file (the main locale).

(2) Rules for creating new locale

My proposal (first rule applies to all locales, both rules must be fulfilled for all sub-locales):

(1) New locale for one language should be created only for countries the particular language is used either as official or national language, e.g. enUs, frBe.

(2) New sub-locale of a language should have different grammer or spelling rules from its main locale (like ‘soccer’ in American English and ‘football’ in British English; or ‘Januar’ in deDe and ‘Jänner’ in deAt). This condition does not include different names of celebrations (like naming of of the companions of St Paul Miki, because that particular companion comes from that particular coutry)—for that particular case I would suggest implementation of something like I proposed in issue #24.

If one would require/want an English National Calendar in, let’s say, Polish, s/he should be able to generate it using options calendar = england; locale = plPl (or something like that).

(3) Keys that should be included in locales

In my opinion, all the keys should be contained in all main locales,[1] including celebration name variations for some countries[2] and national feasts celebrated in other national calendars only.[3]

Sub-locales (e.g. for English, any locale but enUs, like enCa, enGb) should contain only those keys that have different.

This way we could maintain a complete list of all keys (in all and any main locales) and if somebody would like to contribute the translation of the keys into his main locale, [4] he would just grab the any main locale (which s/he can speak/understand), rename it according to his main locale and translate the strings. S/he would not need to take care if there are all keys or not, not even with the possible duplicates. But still, s/he would need to report any additional keys or changes that would be necessary to apply in all other locales. However, I still think that this (my/suggested) approach is not perfect either.

However, as this is your repo and project, you decide what is the best approach. This is just my suggestion.

[1] This means that all main locales should contain all the same keys and the same number of lines. Note that one main locale (e.g. enUs) should contain keys in that particular language (enUs) only. The variations I talk in issue #24 should be in that language (enUs).

[2] See issue #24. Note that here I suggest ‘key-dot-country keys’ for celebrations, which are used in other countries (or even the General Calendar), too, but they have some major difference, like naming one of the companions of St Paul Miki (because that companion came from that particular country) or when Our Lady is celebrated with different title (like Our Lady of Sorrows in enUs and ‘Our Lady of Seven Sorrows’ in skSk). The names Our Lady and [Blessed] Virgin Mary does not count are difference.

[3] For example, Saint Francis Solanus is celebrated in Argentina, Bolivia and Peru, but it is included in all main locales.

[4] Note on creation of new sub-locale: the potensial translator should not copy all keys from a main locale, but only the keys necessary.

(2) Create a wiki page how to localise (some standardisation)

I could do this one, but only after I translated the skSk.mjs locale, for I’m still finding some duplicates. After I finished the translation, I can give you my proposal / push request.

(4) Fallback languages if the specified locale does not exist

First, see the following scheme.

enUs
`— enCa
`— enGb
`— en..
`— deDe
   `— deCh
   `— deAt
   `— de..
`— frFr
   `— frBe
   `— fr..
`— skSk
`— plPl
`— ...

There would be 2-step fallback: (1) If a user would choose a locale (like frBe), use it. (2) If user chosen locale (like frBe) won’t be available (or just a key within it), it would fall back to the main locale of that language (in case of frBe it would be frFr). (3) If even the main locale of that particular language does exist (or a key within it), it would fall back to enUs (althouth as much as I’d like to let enGb be it, as it is the main locale of Englishes, enUs is most [world-]widely used and you have chosen to be the fallback version).

(5) List of main locales

It would be nice to have a list of all main locales (or at least as intensive as possible), on which (the list) we would agree beforehand, so to prevent potential arguments. I would include this list in the 'Localisation Rules' wiki page [see point (2) above].

Note that main language does not have to have a sub-locale.

Here’s my suggestion (work in progress—that is, I have not listed all languages yet):

csCz,
deDe,
enUs,
esEs,
frFr,
laVa,
plPl,
skSk.

Probably we should list the sub-locales too or just create a combined list for coding purposes (to use in case, when the sub-locale has no *.mjs file and it has to fall back to its main locale).

(6) My TODO

First of all, I have not as much time for romchttps://github.com/tukusejssirs/romcal/projects/1al as I’d like to have, so bear with me. :)

UPDATE: I create a project myself for the following and all future TODOs.

(1) Finish Slovakian locale translation (2) Sort the celebrations from the General Calendar in skSk.mjs (3) Sort the national celebrations in skSk.mjs (4) Sort (or generate using Bash script) other locales (starting with enUs.mjs) (5) Create wiki page, how to localise (some standardisation and rules, which should be abide—after your approval) (6) Create Latin locale (laVa.mjs)

emagnier commented 4 years ago

In fact the 2 first letters correspond to the language, and the 2 nexts correspond to the culture.

So when we have en-us, it mean English - United States. For fr-fr, it's French - France.

I'm not confortable with the concept of main localisation and sub localisation. Also it could mean that enUs is superior as enGb, and enGb just inherit from the main enUs (wars have been declared for less than that). The general practice in app localisation is to have one default file en, then if required a more culture specific en-us or en-gb that will inherit or be applied on top of the main en language file.

So if we send fr-be to the application, it should first try to load the fr translations, and then apply the fr-be if exists.

Concerning Romcal, it mean that enUS should be renamed to en, because it contain the default English translation. Same for skSk -> sk which is the main Slovak language (and don't have culture variation yet). Then Romcal should manage correctly the localisations:

given as a parameter (do a difference between the language and the culture),
apply first the language translations, and then (if required and file exists), the culture translations.

Actually in Romcal, there is only culture variations in English, and English is anyway the default language (it's why we don't had issue yet). But if we start to add cultures specific in another language than en, like fr-be, we will start to see mixed translation between English and French.

This isn't something complicated to manage, but it need to be fixed to correspond to the localisation standard. This will also address the issue from #24, and avoid to duplicate keys, like ourLadyOfMountCarmel and ourLadyOfMountCarmelMotherAndQueenOfChile.

tukusejssirs commented 4 years ago

In fact the 2 first letters correspond to the language, and the 2 nexts correspond to the culture.

So when we have en-us, it mean English - United States. For fr-fr, it's French - France.

Thanks, but I know that already, although I did not use the proper naming (sub-localisation vs culture). :smiley:

The thing is the the culture is a particular language type. For example, there many Englishes, i.e. British English (enGb), American English (enUs), Australian English enUs; same thing is applied to German, Spanish, Portugese, etc. I know that some languages don’t (usually) have that culture part (like Slovakia: sk), sometimes even they use the culture part by doubling the language part (e.g. skSk).

So, this is issue deals the differences between language cultures (or sub-languages), like in British English we use Saviour, in the US they write it as Savior (no u).

I'm not confortable with the concept of main localisation and sub localisation. Also it could mean that enUs is superior as enGb, and enGb just inherit from the main enUs (wars have been declared for less than that). The general practice in app localisation is to have one default file en, then if required a more culture specific en-us or en-gb that will inherit or be applied on top of the main en language file.

Well, I agree that some might fight against it (I’d vote for enGb as I prefer that English over any other Englishes), in the end, every project/program/whatever needs to decide which language is the default. I am sort of againt en as that might be any one English (but which is it?) and usually (in other programs) it is enUs. Still, the main language should have all strings translated in order to let other languages default to the translation of the main language. And then again we need to decide which form of the word should be use (Saviour or Savior? organisation or organization?).

Another reason why I would vote for the language subordinance that it might save some space. For example, all Englishes share fair amount of words that does not need to be copied to all of them. For example Epiphany is same in all Englishes, so we should have it only in enUs and the rest of Englishes should just leave it untranslated and thus default to the enUs translation.

Anyway, I think we should keep the enUs as the project main language as kind of heritage of @pejulian and because in the most global software it is the main language from which all strings are translated into other languages.

So if we send fr-be to the application, it should first try to load the fr translations, and then apply the fr-be if exists.

I would suggest the opposite order: for every string it should first try to load the culture language (like frBe) and if it does not find that particular string, it should look in the main language, which for French should be frFr, and finally, if it does not find that string translated in the main language it should revert to the project main language, which is alway enUs (this language has to have all strings, as the strings should be mainly translated from this language).

Concerning Romcal, it mean that enUS should be renamed to en, because it contain the default English translation. Same for skSk -> sk which is the main Slovak language (and don't have culture variation yet).

I would vote against this change, because:

en still would contain names/titles/words of a particular English language (most probably enUs) or it would not contain all strings (i.e. strings that are different in any, even one, other English would not be in en);
in my opinion, all languages should have culture part, because there are some dialects, and thus by including the culture part in languages that does not generally have culture part, we would explicitly say that it is the standard language (langue littéraire).

Actually in Romcal, there is only culture variations in English, and English is anyway the default language (it's why we don't had issue yet).

Yes, in romcal, there are only culture variations of languages—as in the real world. In the real world, there is no general English or general French; there are only French French, Belgian French, British English, US English, etc. I don’t say that other softwares/programs don’t try to create such languges (they do), but I think it is not the correct way (for the reason mentioned above).

But if we start to add cultures specific in another language than en, like fr-be, we will start to see mixed translation between English and French.

Well, the thing is that when the locale is set to frBe and a string is not translated neither in frBe nor in frFr, romcal should use the string from enUs. That might happen (and should be considered correct, but could be made better by translating the missing strings) when one wants to generate for example Slovak calendar in French locale: I believe you have translated the French strings, but translated only those strings (celebrations) which are in the French calendar (i.e. not those that are only in other particular/national calendars). And thus, yes, the final Slovak calendar in French would be a mixed, French-English version (currently).

But you talk about something else. :smiley:

This isn't something complicated to manage, but it need to be fixed to correspond to the localisation standard. This will also address the issue from #24, and avoid to duplicate keys, like ourLadyOfMountCarmel and ourLadyOfMountCarmelMotherAndQueenOfChile.

Well, partially yes.

I would suggest what I have suggested in #56: for celebration strings of persons, we should separate the canonisation level (the words saint or blessed), titles (like martyrs or patronages) from the celebration strings and make them translatable separately. It is a bit major rework, but it would (in my opinion) make the translations:

easier, simpler and better;
more modular;
in particular (national) calendars, we could add/remove some titles based on the needs of the particular calendar.

Also, this (#56) should be dealt together with #12 (my suggestion is in this comment), i.e. we should create a join function (or however you would call it). List of other suggested functions is here.

Here I have started the separation of the canonisation level and the titles from enUs strings. There are many things required to rework in romcal if we want to incorporate that, but in the end it would make it much better/easier to translate strings to other languages. Also it would have to be consistent in translations. I think this should be the base for the DB where we would like to place the translations.

In the end, I have created multiple spreadsheets regarding romcal; you can find them all here.

tukusejssirs commented 4 years ago

Just a note from what I have discussed with @emagnier in #82: consider moving localisation files (DB) to a separate repository.

emagnier commented 4 years ago

I'm just more in favour to follow the industry standard and mechanizen for localisation in app development, and keep things as simple as we can (why reinventing the wheel and do things different when people are already used to work that way). This was also suggesting to avoid duplicated keys when it's not necessary.

tukusejssirs commented 4 years ago

Well, still I am not quite fond of using en as the romcal main language instead of enUs.

What exactly do you propose? How would it all work in your proposal? By it I mean the falling back when a language (or a string) is missing in a particular language? And all the other stuff I dealt in my previous comments in this issue.

emagnier commented 4 years ago

In an architecture point of view, it's always better to go global to specific, not the opposite.

en.json will cover major of the cases for en-US, en-GB, en-CA... and will be the default localisation file for English. So if you set the localisation to en-GB, and if en-GB.json exists and have a string available, it will pick it over the en equivalent. If not found, it will take the string in en.json .

xx-XX.json files are just to override a specific string to make it more culturally relevant. This is where it can be useful to manage cases like ourLadyOfMountCarmel andourLadyOfMountCarmelMotherAndQueenOfChile. Here the second key could be removed in favor of the first, with the string redefined in a es-CL.json file (for Spanish - Chile). So the big advantage here is we can stop to duplicate keys when the title of a saint need to be different in a specific country.

This is what I tried to explain in my previous comment. And this is not my proposition but an architecture standard in application development.

tukusejssirs commented 4 years ago

In an architecture point of view, it's always better to go global to specific, not the opposite.

I don’t say this is not true. With this I have least troubles and therefore I could say I agree with this.

en.json will cover major of the cases for en-US, en-GB, en-CA... and will be the default localisation file for English. So if you set the localisation to en-GB, and if en-GB.json exists and have a string available, it will pick it over the en equivalent. If not found, it will take the string in en.json .

This is what troubles me. To cover [all] major of the cases for en-US, en-GB, en-CA [etc], one (who would maintain the enUs translation) would also need to know what is and what is not a major case (I believe you mean the common/same string translation across all variations/dialects of a particular language). And I think this is quite impossible. One person never knows all Englishes; they know at max two, generally they know only one English. This would require increased communication between contributors of all English dialects. This seems unnecessary if we choose a dialect to be the main language.

xx-XX.json files are just to override a specific string to make it more culturally relevant. This is where it can be useful to manage cases like ourLadyOfMountCarmel andourLadyOfMountCarmelMotherAndQueenOfChile. Here the second key could be removed in favor of the first, with the string redefined in a es-CL.json file (for Spanish - Chile). So the big advantage here is we can stop to duplicate keys when the title of a saint need to be different in a specific country.

Have you already read (in full) this comment of mine in #12? There I suggest to use the following syntax and I believe that in is the proper way to deal with such duplicates. In general: there should be only one key ourLadyOfMountCarmel and other titles used in particular calendars should be added with a function in chile.js (i.e. in the calendar definition). All in all, I don’t think that this is a big advantage using your proposal. I think that this solution of the duplicates removal is better and easier to maintain, but it requires to fix the separation issue first (#12) and thus it can be done only after we switch to DB.

vianneyJohnMary{
    .surname="Vianney"  // It might be a [nobiliary particle](https://en.wikipedia.org/wiki/Nobiliary_particle), too
    .name="John Mary"  // All first names
    .holiness="Saint"  // This might be an int; todo: find better name instead of 'holiness'; options: st/sts/bl/bls
    .title="priest"  // Comma-separated list
    .number=1  // How many persons?; only used in `saints/`
    .sex="male"  // male/female/mix; only used in `saints/`
}

# Or for Our Lady of Mount Carmel in the particular calendar of Chille
outLadyOfMountCarmel{
    .title="motherAndQueenOfChile"
}

emagnier commented 4 years ago

I really don't agree with your first comment. I worked several years on major projects with l18n, and I can say we don't have to know all the cases for all specific regions, but have something more common and generic. Then the xx-XX files will adjust specific things if needed. But you don't need to know all the cases one shot. And this is why a default language file like en is required.

You need to think also how these localisations will be managed by the application, between apps, and potential apps that will depend of romcal. Not only has translator point of view. This is why I didn't made yet your suggestion to specify the browser localisation in the sample index.html (in my last PR), because it's more complicated than it seems, and the logic is actually different between what the browser gives, and the way it's actually (and not correctly) managed in romcal. We could have all best ideas but this domain is pretty standardised and it's important to follow conventions. So I think it's a very bad idea to implement a different logic for localisation that will mixed up most potential contributors, and might need a localisation adapter when romcal is plugged with other dependencies in different projects.

Regarding your second comment (related to #12), localisation files should be as simple as we can without any logics. The way I suggests is also the general practices and conventions recommended in software development to manage theses cases. Following conventions make then easiest to extend romcal ressources in another projects.

I don't have too much time to do more explanations here since I'm actually well busy on other projects, but the localisation mechanism is something I would like to refactor and make it more standard, straightforward and extendable. This is also something I need to have on another project I'm working on, to be able to plug different ressources with romcal.

tukusejssirs commented 4 years ago

Okay, I give in.

At first, I wanted to defend my proposal, but since (1) you’re the maintainer now and (2) you have more experience in the field, you proposal has more weight.

But I have a few requests:

make en be actually enUs (i.e. it contains all enUs strings), while enUs would be just kind of a symlink to en; same would apply to all other languages (like fr = frFr or es = esEs);
(this might be what you have already proposed) revert algorithm (if missing, try next): frBe → fr (which is actually frFr) → en (which is actually enUs);
(this might be what you have already proposed): xx locales (the ‘main locales’ in my former terminology) would all contain all strings, while xxXx (except for those ‘symlinked’ to xx) would contain only those strings which differ from the translation of the locale upper in the language rank.

I believe the 3rd request requires an example:

en (enUs) = "Savior", "Mary", "Saint"
enGb = "Saviour"

As you can see, the enGb is missing ‘Mary’ and ‘Saint’ strings, because those are exactly the same same in en (enUs). romcal would do this:

check the string in enGb: missing → continue up the language rank;
check the stirng in en (which is enUs): found → use that string.

Regarding your second comment (related to #12), localisation files should be as simple as we can without any logics. The way I suggests is also the general practices and conventions recommended in software development to manage theses cases. Following conventions make then easiest to extend romcal ressources in another projects.

I don’t know if I have explained properly my proposal. The strings to translate (the localisation files) would contain strings like these:

saint
blessed
Mary
Joseph
celebration
John Paul II.
Beheading of {johnTheBaptistKey}
...

This is the simplest form I could imagine. So I believe we’d like to achieve the very same thing.

I think I have confused you with the example of my previous comment. It was not an example of what needs to be translated, but how to calendars would be defined. That would not be the translators job, but the job of the ‘calendarists’. Have I still failed to explain my proposal (and the difference between the jobs of the translators and ‘calendarists’) the way you could understand it? And if you understand it, do you agree now? Or do you have some remarks? :smiley:

I would like to ask you to check upon Fluent a localization system for natural-sounding translations as they describe it. I have found it today on the Internet and it got me with the way one can incorporate the grammer stuff (you know, some languages have quite complicated grammer rules, like Slovak). They have a JS implementation of it. Could you please this DB-related info if it is any good to our use case? Thanks in advance. :smiley:

PS–I am truly sorry I keep you from work (on other projects). :disappointed:

emagnier commented 4 years ago

Yes, it's the concept of inheritance. If something is missing at a specific level, trying to get it from the upper level.

And if there is something in the English from United-States that should be explicitely for that country, not to be used in any other English, we should put it in a en-US.js file. en have to be considered as the most widespread form of English, not specifically English from United-States (even if we can consider that the most widespread form of English will statistically come from United States). Same things for fr and any other base language.

I didn't had time yet to check Fluent closely. I also have in mind libraries like i18next.

tukusejssirs commented 4 years ago

I still see more issues (from translating point of view) with the en not containing all strings translated. What would happen if a string is missing both from fr-BE and fr, and the string would not have a widespread form?

Update: I’ve check the i18next and from my point of view (after briefly checking its features and comparing them to Fluent) I think both i18next and Fluent are quite good for our (current and future) needs. So, it depends mainly on you, which would suit you more from coding perspective. :) I must say though that i18next syntax is (looks to me) a bit more complicated.

emagnier commented 4 years ago

en should contain ideally all string translated. Actually, if a string is found in any of the locales file, romcal will return an empty string. In that particular case, it should be better to output at least the key instead of an empty value. So we can visually identify that a translation is missing, but we still know what should be translated thanks to the key.

Fluent seems really interesting but is quite new. i18next have the big advantage to be made with and for Javascript first, have a great community, and support a lot of JS library, including Moment.js

tukusejssirs commented 4 years ago

en should contain ideally all string translated.

I agree, but in my opinion that would require choosing one particular English, because otherwise it would either (1) not contain all strings or (2) contain mixed strings from different Englishes.

In that particular case, it should be better to output at least the key instead of an empty value.

I would suggest that a key (after localisation) should never return an empty value nor the key name. It should always return a translation (string), be it in the chosen language or in English. How do you think it would look like when the key name would be output instead of Close and comment here on GitHub?

All l10n/i18n solutions use a specific source language and generally it is en-US.

Fluent seems really interesting but is quite new. i18next have the big advantage to be made with and for Javascript first, have a great community, and support a lot of JS library, including Moment.js

As I said: I don’t care which one you choose. From my point of view, the most important feature is that it should support (as much as possible) the most complicated grammers in the world (singular/plural and gender is not enough; context in i18next might be useful for this).

emagnier commented 4 years ago

I would suggest that a key (after localisation) should never return an empty value nor the key name. It should always return a translation (string), be it in the chosen language or in English. How do you think it would look like when the key name would be output instead of Close and comment here on GitHub?

Yes, it's the ideal scenario, but what's happen if the value of a key is still missing? In my experience with different localisation libraries, the localisation method return the key when it found no value. And this make it useful for translator (they know what key is missing a translation) and for any users (developers, testers...) because they know what's happen instead of have no indication. From your exemple, the Close and comment button without translation could looks like {closeAndComment}, so we know this translation is missing, but we also know what's happen if we click on this button.

For everything else I agree.

tukusejssirs commented 4 years ago

Yes, it's the ideal scenario, but what's happen if the value of a key is still missing?

If we choose a source language (e.g. en-US), when we add a new string, it must be first added to the source language first (i.e. key is created and the string in the source language is associated with it) and only after the addition of the new string to the source language, we should translate it to other languages. (Same would apply for key/strings modification or removal.) This way, no string would be never missing from the source language (en).

If you really want to have a run-time warning/error of missing string, we could that have. But in my opinion, at run-time, there should be every and all strings localised (be it to chosen language or the fallback one). The missing strings (or the progress of translation of a particular language) should be known from the localisation tool, not from romcal itself.

From your exemple, the Close and comment button without translation could looks like {closeAndComment}, so we know this translation is missing.

Yes, but this should never ever happen run-time. When a user runs romcal, all strings should be localised and if a translation is not found, use the string from the source language.

Your proposed behaviour could be the case in the debug mode only.

emagnier commented 4 years ago

I fully agree with the idea. This is a good theory but in the reality I'm wary of the best scenarios :) I've more than 15 years on web app development, and by experience I know it's always better to display something not well translated (or its localisation key) instead of nothing that could be considered as a normal feature, or just forget to translate it because we do not see it in the UI. By doing it that way, it will help contributors to understand that something is missing, and help them to fix the missing string.

But if we take care of translating everything that is common, or came from the general calendar, the missing key situation might be very limited.

tukusejssirs commented 4 years ago

I've more than 15 years on web app development.

I don’t want to argue on the experience front as obviously you overcome me easily.

[…] I know it's always better to display something not well translated (or its localisation key) instead of nothing that could be considered as a normal feature, or just forget to translate it because we do not see it in the UI.

I am not comparing displaying a key vs nothing, but key vs en string in non-en locale.

On displaying nothing (i.e. a situation when a string is not translated, not even in en): I’ll do my best to make sure that all keys will be translated at least in en and sk (prerequisity: a localisation tool).
On displaying en string in non-en locale: From the end-user perspective, I think it is better to have portions of the UI/strings in English (when the strings are not translated into selected language). From dev perspective, what you say might be true, but: I think that the localisation tool will (should) make it much simpler to get the information which string is used, how and where, and also which string is translated and which is not. If you insist, you could implement this into the debug mode, but please: do not make it the default behaviour. It would be ugly. :smiley:

But if we take care of translating everything that is common, or came from the general calendar, the missing key situation might be very limited.

What do mean by everything that is common? We should aim to translate everything used by romcal. That means, we should translate even those celebration/saint names which are not in a particular calendar, for one might want to generate, for example, the French calendar in Slovak locale.

emagnier commented 4 years ago

Ok you got it, no key output 😉

An external localisation tool will take care of that. But I still think we need to have from romcal, a simple way to know what translation is missing. Maybe a new npm command could take care of that, and list all keys that needs translation. Something like :

$ npm run check-locales

tukusejssirs commented 4 years ago

I still think we need to have from romcal, a simple way to know what translation is missing.

As I said, this is useful for debugging. I have no problem if this is implemented, just don’t make it the default behaviour.

$ npm run check-locales

Yes, that is one of the possible solutions. I have no problems with that. :smiley:

emagnier commented 4 years ago

I also liked your proposition in one of your comments above, to split the name of a saint in different part:

vianneyJohnMary{
    .surname="Vianney"  // It might be a [nobiliary particle](https://en.wikipedia.org/wiki/Nobiliary_particle), too
    .name="John Mary"  // All first names
    .holiness="Saint"  // This might be an int; todo: find better name instead of 'holiness'; options: st/sts/bl/bls
    .title="priest"  // Comma-separated list
    .number=1  // How many persons?; only used in `saints/`
    .sex="male"  // male/female/mix; only used in `saints/`
}

# Or for Our Lady of Mount Carmel in the particular calendar of Chille
outLadyOfMountCarmel{
    .title="motherAndQueenOfChile"
}

This is something that could directly lean on i18n functions (interpolation, nesting...). I'm not sure if we should have 2 different keys for name and surname but I liked the other properties. French locale contain also year of death, that could be another field.

Then we could have a general option to specify how the name of a saint should be outputted by default (if a specific output isn't defined for a saint). Something like:

sanctoralDefaultOutput: "{{holiness}} {{name}}{{titlesSectionSeparator}} {{titles}}"

Localisation files will be more standard, while having the possibilities to customize easily the output (depending of our needs), like adding the date of death.

And it will give the possibility to redefine only the part that have specificities in region locale files, like the title for ourLadyOfMountCarmel in Chile.

tukusejssirs commented 4 years ago

I'm not sure if we should have 2 different keys for name and surname

The reason why I suggest to do it this way, is that the name (like Peter, Paul) is used by many saints. All of them (expect one, usually apostle) have some kind of surname, which differentiates between them.

I know that this creates another burden to create a saint person (probably in src/saints.js file), but still I think it might be helpful.

French locale contain also year of death, that could be another field.

As you could see in one of my Google Sheets document, I have thought about that too. But years of death are not string to be translated, therefore these should be (in my opinion) somewhere else, probably in the same file as the saints (e.g. src/saints.js) as an attribute/property.

Then we could have a general option to specify how the name of a saint should be outputted by default

I think that how the name of a saint should be output by default should be defined by the locale itself. There should be probably another file with definitions of default settings/options. It is defined somewhere already, but not transparently. I think that there should a file per locale, while all settings/options should be overridable.

Please, do find some time and thoroughly read through all the Google Sheets I have created. I know I should make some time to move the suggestions from there to GitHub issues.

And please, keep in mind that in some languages, there is also used genitive grammer case when the celebrations are output in a calendar form. For example in Slovak, we have Sväteho Mikuláša, biskupa, which translates to [Memorial] of Saint Nicholas, bishop (there is no memorial word in there, but the genitive case is the of in English; sort of). Of course, some might want it in nominative grammer case (like Svätý Mikuláš, biskup, Saint Nicholas, bishop), but that should not be the default option for Slovak, though one should be able to configure romcal to output the calendar like that.

sanctoralDefaultOutput: "{{holiness}} {{name}}{{titlesSectionSeparator}} {{titles}}"

Still I think that your proposal is quite simple, but I have no problem with the form/syntax. There are so many things one could use, although the default should reflect (in my opinion) the official liturgical calendar (or the directory) or the general practice of that particular country/calendar/locale.

But in general, I think we agree on the general way it should work. :smiley:

emagnier commented 4 years ago

The reason why I suggest to do it this way, is that the name (like Peter, Paul) is used by many saints. All of them (expect one, usually apostle) have some kind of surname, which differentiates between them. I know that this creates another burden to create a saint person (probably in src/saints.js file), but still I think it might be helpful.

Right, it will be less easy to create/translate, but yes it have some advantages. Then you can compose the output of a title as you want, or you could for example, get all saints that have the name of Paul :)

As you could see in one of my Google Sheets document, I have thought about that too. But years of death are not string to be translated, therefore these should be (in my opinion) somewhere else, probably in the same file as the saints (e.g. src/saints.js) as an attribute/property.

Yes, it's something very interesting to add extra informations about a saint. But if we add too much informations concerning a saint (I'm of course not opposed to that), a GUI might become required to edit and translate easily...

About a saints.js, I had the idea to merge together the general and national sections from the localisation files. Because when a saint is moved to national for a specific country, we also need to duplicate the corresponding key in the locale file... Actually I see (at least in french with french calendar) a few missing translations, just because they are not in the right section (general/national). Just moving them will fix the french calendar in french, but will create issue on other calendars displayed in french (where saints are still general, not in a national calendar). And of course I think it's a bad idea to duplicate the keys in both general and national sections. To start with, we could merge these 2 sections in a new saints or sanctorale section in the current locale files.

I think that how the name of a saint should be output by default should be defined by the locale itself. There should be probably another file with definitions of default settings/options. It is defined somewhere already, but not transparently. I think that there should a file per locale, while all settings/options should be overridable.

And please, keep in mind that in some languages, there is also used genitive grammer case when the celebrations are output in a calendar form. For example in Slovak, we have Sväteho Mikuláša, biskupa, which translates to [Memorial] of Saint Nicholas, bishop (there is no memorial word in there, but the genitive case is the of in English; sort of). Of course, some might want it in nominative grammer case (like Svätý Mikuláš, biskup, Saint Nicholas, bishop), but that should not be the default option for Slovak, though one should be able to configure romcal to output the calendar like that.

Yes I'm aware of that, and yes definitively this settings must be localised or/and be somewhere in the locales properties.

Please, do find some time and thoroughly read through all the Google Sheets I have created. I know I should make some time to move the suggestions from there to GitHub issues.

I didn't had time to dive more into it yet. However I already took a quick look on it and it's amazing all the work you made already. It help me already to prototype and test the evolution of the data model for calendars, locales and some other settings, by taking in consideration all possible new features and data.

tukusejssirs commented 4 years ago

I had the idea to merge together the general and national sections from the localisation files.

Yes, that would be awesome. :smiley: But I would rather devide it differently: the common, celebrations and saints parts (they might require a name change; we might create separate files for them). In the common part, there would be the feria and such stuff. The celebrations would contain non-personal celebrations (like events commemorations, e.g. Easter, Christmas, Beheading of St John the Baptist). The saints would contain saints and blesseds (even All Saints).

I know, it’s division vs division, but this one might a bit better. Still, we should document every decision.

emagnier commented 4 years ago

Do you suggest to group together in celebrations the actual:

advent
christmastide
epiphany
ordinaryTime
lent
holyWeek
eastertide
celebrations

Merge general and national in a new saints section.

Create a new common section to translate everything else, like cycles, liturgical seasons, psalter weeks, titles, types...

We might create a new issue concerning this specific topic. This is something that bother me, and would like to address soon. This might fix also different issues of duplicated/missing keys.

tukusejssirs commented 4 years ago

Yes, sort of.

I would suggest rather to have separate files for the following sections under src/locales/lang[-REGION]/ folder: -celebrations.js (as you have said in your previous comment);

saints.js or sanctoral.js (whatever name suits you more) for separate/singular saints and blesseds, and collective names like All Saints;
common.js for general/ordinary/ferial names (this might be moved elsewhere or futher separated; also include names for special days like the Sunday of the Divine Mercy);
titles.js for the titles of the saints/blessed (like bishop or patron of Europe);
week_days.js (Sunday through Saturday); this might be available from different sources, but I still think we should have a unique/specific place for this;
canonisation.js or canonisation_levels.js for saint and blessed;
seasons.js or liturgicalSeasons.js like Lent or Ordinary Time;
months.js for calendar month names;
langs.js or languages.js for supported languages (like English, British English, Slovak, Belgian French) and there codes (like en, en-GB, sk, fr-BE);
countries.js for supported country names (like United Kingdom or United States of America) and their codes (like UK or GB—to be decided);
bible_books.js for Bible books names and there shortcuts (like Matthew and Mt; also the following terms: Bible, Holy Scripture);
numbers.js for ordinary and cardinal numbers up until number 40 (the upper limit should be hard-coded and should be decided according to the general needs; there are 34 weeks in the Ordinary Time, therefore I suggest to have the upper limit minimally set to this number);
colors.js or liturgicalColors.js for colour names (like green or white);
calendars.js for calendar names (like General Roman Calendar of 1969; the naming should be decided);
ui.js (to be decided);
subcalendars.js; by subcalendar I can the calendars that have some supplements to the general or particular calendar (like for specific religious communities; e.g. Slovak Calendar for Dominicans; the naming should be decided; note that this is not currently implemented at all);
continents.js like Europe or North America;
regions.js for states, regions, territories (like England, Wales, Ohio, Quebec);
dioceses.js for the names of dioceses (like the Westminster Archdiocese);
parishes.js for parish names (this is not implemented yet; this is something I’d like to have very much, but this is something to be implemented in the far future);
misc.js or other.js (or whatever else suits you best) for everything else (like year, diocese; this might be further divided).

Some of these sections are not to be implemented for the v2.0.0 (like parishes.js or continents.js).

For more info, see this Google Sheets of mine; look at the sheet names.

emagnier commented 4 years ago

Wow... I'm not sure if having all theses files will simplify things. Yes it gonna be well organised, but potential future contributors may be a little lost in all of this :) But ok, it's a long term vision.

For now I will recommend to keep one file per locale and go step by step ;) But we can already think how to organise better the differents parts of the actual local file:

functional or common (countries, colors...)
temporal (the actual celebration part, time names, weekdays)
sanctoral (actually the general and national parts).

Anyway all your suggestions are interesting to rethink a bit the content structure. Also I know there is still elements not translated (types, titles...)

So a first (and easy) step could be to merge together the general and national locales in a sanctoral part. In this new part, we could organise keys alphabetically (for example). I already made a few tests on that and this is something that I could address quickly.

tukusejssirs commented 4 years ago

Just a side note: I suggest to use the typographic apostrophe (’) instead of typewriter apostrophe (') and likewise the quotes (‘’ and “” vs ' and "). In my opinion these typographic apostrophes and quotes look much better. I suggest this rule to be applied to translation strings only; in code comments, we should use only the typewriter apostrophes.

emagnier commented 4 years ago

Yes but it's not easy to get them on all keyboards. Theses are generally automatically updated on text editor like Word or LibreOffice. But here I'm not sure if we should be so picky.

tukusejssirs commented 4 years ago

I'm not sure if having all theses files will simplify things.

I have opened this issue not to simplify the things, but:

to make as much as possible to be localised;
to create a wrapper issue that would (in the end) fix (or help fix) other issues (like the duplicates or separation);
some other stuff, which don’t come to my mind ATM.

[P]otential future contributors may be a little lost in all of this

Yes, they might be lost, but we must improve our documentation for this very reason (and not only for the localisation). Also note that most translators (contributors) won’t contribute the translation strings for the whole language. Some (like months, numbers, week day names) we might localise even into languages we don’t actively speak, therefore the translators won’t have to deal with of these files (the strings in them).

For now I will recommend to keep one file per locale and go step by step ;)

Okay, for now. :smiley:

But we can already think how to organise better the differents parts of the actual local file

Do as you can implement/rework it. I am open to help you out, but in my opinion we should create some milestones (like v2.0.0), where we would include those issues we want to implement in that particular milestone (some issues might need to be divided). It would make our development more transparent and probably even for us, contributors, better structured.

Anyway all your suggestions are interesting to rethink a bit the content structure.

While you do your rethinking, you should read through the Google Sheets I have suggested. :smiley:

In this new part, we could organise keys alphabetically.

Yes, alphabetic order of the string keys is the best IMO. Although currently, some of the keys include saint/blessed, others don’t, but for now, this would be good enough. :smiley:

Yes but it's not easy to get them on all keyboards. Theses are generally automatically updated on text editor like Word or LibreOffice. But here I'm not sure if we should be so picky.

Well, I know I am too picky. I see it when one uses ´ (acute accent) or ' or ’ or ‘. And I am using Linux (nearly exclusively), therefore I can use Ctrl+Shift+U to input U+2019 (’) and other (more) special characters. But I know that this is not the most important issue. :smiley: