gusbrs / zref-clever

Clever LaTeX cross-references based on zref
LaTeX Project Public License v1.3c
11 stars 4 forks source link

Russian translation #29

Closed jemmybutton closed 3 months ago

jemmybutton commented 3 months ago

Russian translation is added. As discussed here https://github.com/gusbrs/zref-clever/issues/28 some changes were made after "Localization guidelines" recommendations.

gusbrs commented 3 months ago

Hi @jemmybutton , thank you very much for the PR. Overall, it looks good, but some comments.

First, I noticed you included several abbreviated forms. About this, just checking, did you follow the advice in the Localization guidelines to be conservative about them? Are really all of those "common and well established tradition for the language" for use in cross-references? (I know, this question is typically hard to answer by anyone, I just want you to think twice before sprinkling that many abbreviated forms).

Still about abbreviated forms. As things are, you are defining them only for the P declension case. Indentation there is just to ease reading, it has no real effect. The last call to case= defines where any following [Nn]ame-... options belong to. So, if defining abbreviated forms, they must be supplied for all cases (see the German language file).

Second, currently, the compilation of the code documentation is failing because of the Russian characters (as far as I can tell, see CI results). So far, I have been using the default pdflatex to build the docs. Is there an easy package / option / what not for pdflatex so that it can handle Russian characters? Or would it be better to use lualatex? I can do this change, specially if it is done on l3build's side. But I have no experience at all in typesetting Russian, so please advise on what is the usual or standard setup for "using some Russian in your document".

Third, you commented:

% Russian translation in consistent with that of \pkg{cleveref}, with the % following exceptions: "equation" is translated as "уравнение", rather % than "formula", "proposition" is translated as "предложение", rather % than "утверждение"; several abbreviations are replaced with more common % ones, e.g. abbreviated plural of "item" is "пп.", not "п.п.".

Could you elaborate the exceptions a little? For babel? I mean, not necessarily in the file, I just want to understand it better.

Fourth, it caught my eye that you set

+refbounds-rb = {с\nobreakspace,,,} ,

as default for all types. Is it so that in Russian all ranges are spelled something like "from M to N"? Well, this just caught my eye as an atypical setting, and I just want to make sure it is intended.

Finally, a comment about notes types. I see you dropped the distinction between footnotes and endnotes. I think this is a good call. This distinction works well for e.g. English and German, for which there is a clear "single word" for each, but when it goes to "[some adjective] note" it starts getting weird. I'm still not sure it was a good decision to have made this distinction... Anyway, good call.

gusbrs commented 3 months ago

Sorry, I completely forgot that the docs don't seem to have support for Cyrillic. This jemmybutton@49ef003 seems to fix it, but not sure if it's the way you'd like to go about this.

No problem, I didn't think of that either at first, only after seeing the CI results. ;-) Anyway, this is indeed not how I'd like to handle this issue. First, you added polyglossia to the mix, and currently the docs are typeset with pdflatex. If we were to add a language package to the document, it should be babel, even for UTF8 engines. But I'm not even currently doing that, because technically this is an "English only" document, the language files are meant to be extracted and are "code", so that we cannot even use babel language markup there anyway. We just need to be able to typeset those characters.

jemmybutton commented 3 months ago

First, I noticed you included several abbreviated forms. About this, just checking, did you follow the advice in the Localization guidelines to be conservative about them? Are really all of those "common and well established tradition for the language" for use in cross-references? (I know, this question is typically hard to answer by anyone, I just want you to think twice before sprinkling that many abbreviated forms).

I tried to only use the ones I saw and used myself. Just quick search around the books I had on my hard drive revealed most of them (I attached some of the screenshots). And I even missed one quite common abbreviation of "Book" as "кн.".

The ones I couldn't find right away are:

"абз." for paragraphs, but I don't remember seeing anyone reference paragraphs, so I could only google that this abbreviation does exist (on a side note, there's a thing called "параграф" in Russian, which is normally not the same as "paragraph" in English, and more like a bullet-point or something like this, and it's normally referenced using this symbol §).

"п." for "item", but this one is fairly common in legal stuff (see e. g. https://www.consultant.ru/search/?q=%D1%81%D1%82.+%D0%BF. )

"ур." for "equation", normally equations seem to be referenced without any word at all.

Naturally, in many cases some abbreviations are used, while others are not. E.g. on the last screenshot you can see unabbreviated "предложение" next to "опр.". I'd say, none of the abbreviations I listed should seem out of place in a Russian text, but no author, I assume, would agree on every decision. Some authors avoid abbreviations altogether, save for the ones for "page" and "figure", some use way more.

theorem-mb remark-mb figure-landau page-landau chapter-landau book-vz proposition-vz part-mb definition-mb

Still about abbreviated forms. As things are, you are defining them only for the P declension case. Indentation there is just to ease reading, it has no real effect. The last call to case= defines where any following [Nn]ame-... options belong to. So, if defining abbreviated forms, they must be supplied for all cases (see the German language file).

Sure, no problem. Most of the time abbreviations don't depend on the declensions.

Second, currently, the compilation of the code documentation is failing because of the Russian characters (as far as I can tell, see CI results). So far, I have been using the default pdflatex to build the docs. Is there an easy package / option / what not for pdflatex so that it can handle Russian characters? Or would it be better to use lualatex? I can do this change, specially if it is done on l3build's side. But I have no experience at all in typesetting Russian, so please advise on what is the usual or standard setup for "using some Russian in your document".

Most of the "modern" solutions seem to rely on babel or polyglossia , and I'm using those. I can try and figure out how avoid using those in a correct way, at the moment I don't have such a solution.

Could you elaborate the exceptions a little? For babel? I mean, not necessarily in the file, I just want to understand it better.

Sure.

In cleveref "proposition" is translated as "утверждение", I don't remember ever seeing it used in place of "proposition", but in all the translations of Euclid's "Elements" and in many books on geometry the word is translated as "предложение", so I used the translation I know for a fact is widely used, at least in the context I'm familiar with.

For "equation" there's "формула" in cleveref which is basically "formula". Most of the time I see equations referenced simply by number, but I don't remember seeing them referenced by the word "формула", more often more specific words are used, such as "equality" ("равенство"), as you see on this picture, "equation" ("уравнение"), or "inequality" ("неравенство"). Sure, it seems to make sense to use a more general term (equation is a kind of formula after all), but since it's not the case in English, I gather, for the sake of consistency, it makes sense to do the same in Russian.

equalities-kiselev

as default for all types. Is it so that in Russian all ranges are spelled something like "from M to N"? Well, this just caught my eye as an atypical setting, and I just want to make sure it is intended.

I can't say I'm completely sure about this, and ideally, I'd look at a bunch of examples of how this is used. More often ranges are referenced with a dash, like "с. 24–38", but if I were to write a range with words I'd write "страницы с 24 по 38", definitely not "страницы 24 по 38", because the latter sounds clearly wrong to me.

jemmybutton commented 3 months ago

No problem, I didn't think of that either at first, only after seeing the CI results. ;-) Anyway, this is indeed not how I'd like to handle this issue. First, you added polyglossia to the mix, and currently the docs are typeset with pdflatex. If we were to add a language package to the document, it should be babel, even for UTF8 engines. But I'm not even currently doing that, because technically this is an "English only" document, the language files are meant to be extracted and are "code", so that we cannot even use babel language markup there anyway. We just need to be able to typeset those characters.

Oh, actually, sorry, I'm being stupid, for the Cyrillic characters to work, it's just a matter of adding

\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}

And no need to add any babel stuff. Seems to work just fine with pdflatex.

jemmybutton commented 3 months ago

With this https://github.com/jemmybutton/zref-clever/commit/99e533c1b834a25353f0f5855d44ff0ebd443d75 it should work better. Also removed the abbreviations for "paragraph" ("абз.").

gusbrs commented 3 months ago

Now CI is failing with:

! Package fontenc Error: Encoding file `t2aenc.def' not found. (fontenc) You might have misspelt the name of the encoding.

We are getting closer, I think, this is a missing package in the workflow.

Now, I'm actually not being able to built it even locally (with a full TeX Live installation), so perhaps it is still not it. How are you building it? To do it with l3build, use l3build doc zref-clever-code.

If you can compile locally, please add cyrillic to https://github.com/gusbrs/zref-clever/blob/85cc6b7668daf9f0b6ddebffed23f6db633d7333/.github/workflows/main.yml#L19-L101

gusbrs commented 3 months ago

Now, I'm actually not being able to built it even locally (with a full TeX Live installation), so perhaps it is still not it. How are you building it? To do it with l3build, use l3build doc zref-clever-code.

Oh! Actually, I was able to compile it locally with your last commit, sorry. ;-)

About other things:

First, I noticed you included several abbreviated forms. About this, just checking, did you follow the advice in the Localization guidelines to be conservative about them? Are really all of those "common and well established tradition for the language" for use in cross-references? (I know, this question is typically hard to answer by anyone, I just want you to think twice before sprinkling that many abbreviated forms).

I tried to only use the ones I saw and used myself. Just quick search around the books I had on my hard drive revealed most of them (I attached some of the screenshots).

If Russian traditionally uses more abbreviated cross-reference names, it is what it is. I just wanted to make sure they were a reflected inclusion, and it seems they are. Besides, I have no basis for judgement here, so I can only trust yours. It is thus your call.

Could you elaborate the exceptions a little? For babel? I mean, not necessarily in the file, I just want to understand it better.

Sure.

In cleveref "proposition" is translated as "утверждение", I don't remember ever seeing it used in place of "proposition", but in all the translations of Euclid's "Elements" and in many books on geometry the word is translated as "предложение", so I used the translation I know for a fact is widely used, at least in the context I'm familiar with.

For "equation" there's "формула" in cleveref which is basically "formula". Most of the time I see equations referenced simply by number, but I don't remember seeing them referenced by the word "формула", more often more specific words are used, such as "equality" ("равенство"), as you see on this picture, "equation" ("уравнение"), or "inequality" ("неравенство"). Sure, it seems to make sense to use a more general term (equation is a kind of formula after all), but since it's not the case in English, I gather, for the sake of consistency, it makes sense to do the same in Russian.

OK. Thanks for clearing that up.

as default for all types. Is it so that in Russian all ranges are spelled something like "from M to N"? Well, this just caught my eye as an atypical setting, and I just want to make sure it is intended.

I can't say I'm completely sure about this, and ideally, I'd look at a bunch of examples of how this is used. More often ranges are referenced with a dash, like "с. 24–38", but if I were to write a range with words I'd write "страницы с 24 по 38", definitely not "страницы 24 по 38", because the latter sounds clearly wrong to me.

OK too. And it is indeed hard sometimes to be sure of things, some of those are quite contentions as a matter of fact. But, if it is intended and "to the best of your knowledge", we are good. And if need be we can also review it in the future if new information arises.

All in all, I'll proceed with the merge, and then adjust some details, including CI. And also include tests for the new language file. Thank you very much.

But I'll let you review things before releasing, of course.

gusbrs commented 3 months ago

@jemmybutton I'm taking a closer look at things here and I have one question regarding the declension cases. I chose to use capital letters for the declension cases in German because they are nouns, so one writes them capitalized. Up to now, German was the only language with declension support. The question is then, which do you think would be best for Russian: lowercase or uppercase "cases"?

jemmybutton commented 3 months ago

In Russian none of these words are capitalized unless they are the first words in a sentence. I can imagine a situation where some of the words in some of the cases can appear in the beginning of a sentence, but for some I would struggle to come up with a reasonable example. E.g. prepositional case, afaik, always requires a preposition in modern Russian (maybe there are some archaic expressions in which one might want to use it without a preposition, but it would clearly be a weird edge case), and therefore should never be capitalized. In Russian specifically I see no reason to keep capitalized versions separately, and would rather have them capitalized automatically if necessary, but can't say if it works for other languages.

gusbrs commented 3 months ago

In Russian none of these words are capitalized unless they are the first words in a sentence. I can imagine a situation where some of the words in some of the cases can appear in the beginning of a sentence, but for some I would struggle to come up with a reasonable example. E.g. prepositional case, afaik, always requires a preposition in modern Russian (maybe there are some archaic expressions in which one might want to use it without a preposition, but it would clearly be a weird edge case), and therefore should never be capitalized. In Russian specifically I see no reason to keep capitalized versions separately, and would rather have them capitalized automatically if necessary, but can't say if it works for other languages.

I'm talking about this:

declension = { N , A , G , D , I , P }

in the language declaration. This implies we have to use case=N etc. and \zcref[d=A]{label} etc. And your comments are precisely why I'm asking about this. It seems to me that typing lowercase is more convenient. So, if the declension cases in Russian are written in lowercase, except for some special cases, we might as well use lowercase for the declension values. WDYT?

jemmybutton commented 3 months ago

Oh, I see. I think it makes sense, to use lowercase letters by default, like \zcref[d=a]{label}. Would \zcref[d=A]{label} produce an uppercase version then?

gusbrs commented 3 months ago

Oh, I see. I think it makes sense, to use lowercase letters by default, like \zcref[d=a]{label}.

Ok, I'll make this change then.

Would \zcref[d=A]{label} produce an uppercase version then?

Orthogonal things. For uppercase \zcref[cap]{label} or, actually, what you normally want is \zcref[S]{label}, which is a shortcut for capfirst=true,noabbrevfirst=true ("S" for "sentence").

jemmybutton commented 3 months ago

I see it doesn't work like this. Honestly, that would be my expectation, to just choose both case and capitalization using just one letter like this.

jemmybutton commented 3 months ago

But this is a separate feature, and would only be useful for Russian for the moment.

gusbrs commented 3 months ago

I just pushed some commits wrapping up your PR, and including regression tests. Please review them and also test it (if you need some guidance on how to extract the files from the repo, let me know). Once you give me the green light, I'll prepare a release so that you can enjoy it. :-)

My changes: https://github.com/gusbrs/zref-clever/compare/0ebc3b9..4b9e8a6

The whole thing: https://github.com/gusbrs/zref-clever/compare/85cc6b7..4b9e8a6

About:

I see it doesn't work like this. Honestly, that would be my expectation, to just choose both case and capitalization using just one letter like this.

But this is a separate feature, and would only be useful for Russian for the moment.

As I said, these are orthogonal things. Each is handled by its own dedicated option.

jemmybutton commented 3 months ago

I just pushed some commits wrapping up your PR, and including regression tests. Please review them and also test it (if you need some guidance on how to extract the files from the repo, let me know). Once you give me the green light, I'll prepare a release so that you can enjoy it. :-) My changes: https://github.com/gusbrs/zref-clever/compare/0ebc3b9..4b9e8a6 The whole thing: https://github.com/gusbrs/zref-clever/compare/85cc6b7..4b9e8a6

On it! But will probably finish tomorrow.

As I said, these are orthogonal things. Each is handled by its own dedicated option.

Yes, I understand. It's just that once the distinction between lower- and uppercase letters in this scenario (\zcref[d=a]{label} vs \zcref[d=A]{label}) is made between languages, as in case of Russian and German, this might start to feel as a part of the interface which can be used to control the behaviour of the system, which it is not. At least it was my immediate impression. It could be a very intuitive convenience thing, similar to above-mentioned "\zcref[S]{label}, which is a shortcut for capfirst=true,noabbrevfirst=true ("S" for "sentence")", but it's way outside the scope of this PR, and I clearly didn't think this through enough to suggest it as a feature.

jemmybutton commented 3 months ago

@gusbrs Do you have some kind of a test file where all the types of labels and kinds of ranges are used?

gusbrs commented 3 months ago

@gusbrs Do you have some kind of a test file where all the types of labels and kinds of ranges are used?

"All" would be hard... ;-)

Stuff is scattered around in https://github.com/gusbrs/zref-clever/tree/main/testfiles. Perhaps https://github.com/gusbrs/zref-clever/blob/main/testfiles/zc-class-book01.lvt will give you a good set of basic cross-ref elements. https://github.com/gusbrs/zref-clever/blob/main/testfiles/zc-zcref-options01.lvt is a comprehensive test of \zcref options, you might want to select those most directly related to the content of the language file (cap,abbrev,d,g, etc.) to play with. https://github.com/gusbrs/zref-clever/blob/main/testfiles/zc-typeset01.lvt tests all cases for compression, ranges, etc. In all cases, these are regression test files, so they "look dirty". I think you are able to just compile them as such, but you may need to massage them a little.

But these files might be overkill for the purpose... If you just want to test a type, you can always go with \refstepcounter, e.g.:

\refstepcounter{figure}
\label{fig:figureX}
\zcref{fig:figureX}

or, if the counter for the type of interest does not exist:

\newcounter{myfoocounter}
\refstepcounter{myfoocounter}
{\zcsetup{countertype={myfoocounter=solution}}\label{sol:solutionY}}
\zcref{sol:solutionY}

and things of the sort. Besides, you don't need to test the whole package anyway, just whatever pertains to the Russian language file. And not necessarily exhaustively, a reasonable sampling is fine too (it's what I would do, and what I did in fact already, but my ability to catch inconsistencies for this case is very limited, since I don't know Russian at all, everything "seems right" ;-).

jemmybutton commented 3 months ago

Found a couple of bugs https://github.com/gusbrs/zref-clever/pull/30 , other than than, looks good

gusbrs commented 2 months ago

@jemmybutton Great! Thank you very much!

I made a release (https://github.com/gusbrs/zref-clever/releases/tag/v0.4.5), which should get to you in a couple of days (if you use TeX Live, a little more if MikTeX). I hope you enjoy it! :-)

And, in case you are willing, you may wish to consider contributing localization for zref-vario as well, which is an important companion package for zref-clever. Good news is that it is much, much easier to do so for zref-vario. It's about eight options and that's it, and varioref can be used as a reference too.

jemmybutton commented 2 months ago

And, in case you are willing, you may wish to consider contributing localization for zref-vario as well, which is an important companion package for zref-clever. Good news is that it is much, much easier to do so for zref-vario. It's about eight options and that's it, and varioref can be used as a reference too.

@gusbrs Will do! Probably within a week or two.

jemmybutton commented 2 months ago

@gusbrs I didn't use varioref, and not sure how it works, but it has Russian translation which, I guess, should work for zref-vario as well.

gusbrs commented 2 months ago

I didn't use varioref, and not sure how it works, but it has Russian translation which, I guess, should work for zref-vario as well.

Indeed, it should be easy to transpose, but some adjustments are also necessary. However, since I don't read Russian at all, and even less so Cyrillic LICR, I'm at a loss.

If you are not acquainted with varioref, the basic concept is simple, but very useful. If I may recommend, adding it to your repertoire is very much worth your time.

jemmybutton commented 2 months ago

I can transpose varioref's lines and make some adjustments, no problem, I just need some time to figure out how it all works, so it might take a bit of time.