Closed odomanov closed 4 years ago
The same error can be reproduced with
\documentclass{article}
\usepackage{textcomp}
\begin{document}
\uppercase{№}
\end{document}
At some point we'll probably switch to l3text
\documentclass{article}
\usepackage{textcomp}
\usepackage{expl3}
\ExplSyntaxOn
\newcommand*{\newuppercase}{\text_uppercase:n}
\ExplSyntaxOff
\begin{document}
\newuppercase{№}
\end{document}
But I'm not sure if we can find a way to fix this in the meantime.
Does this mean that the only way to cope with this now is to replace № with \textnumero
?
Basically yes. - Or use a Unicode engine.
It's similar ä
(which won't break, but also won't be capitalised - \"a
works as expected).
I don't think it makes sense to try and fix up what biblatex
does here. But since I hope to be able to switch to l3text
in the not too distant future anyway, this shouldn't be much of an issue.
I see, thank you.
I can also see that \MakeCapital
doesn't work with Cyrillic letters --- no errors, simply doesn't capitalize. This probably should also wait for l3text
.
@josephwright Is expl3
supposed to be able to deal with the following?
\documentclass{article}
\usepackage{textcomp}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
\text_titlecase_first:n{\textnumero}
\end{document}
A proof of concept with expl3
case changing functions is at https://github.com/plk/biblatex/compare/dev...moewew:l3text. There are a few things that still need to be thought through
expl3
case changing functions don't do brace protection. I think that is a good design decision given that braces mean a lot of things, especially in the BibTeX context.
expl3
case changing optional to avoid backwards compatibility issues. (Or make it the default and offer an opt-out.)babel
/polyglossia
names.@moewew There are still things to do in the expl3
code for emulating \protected@edef
: I'll have to work on it.
@moewew Issues with \textnumero
(etc.) fixed in expl3
for next release.
@moewew Issues with
\textnumero
(etc.) fixed inexpl3
for next release.
Thank you very much. I also asked to get a feeling what expl3
wants to support here. As I understand you are aiming at a very general solution, so am I right in thinking that everything that could reasonably appear in a title
field should be OK?
Do you have any opinion about brace protection (see my two questions above)? Or a feeling how difficult it would be to get brace protection back for the case changing functions?
@moewew The aim is to cover 'any reasonable text', which means emulating \protected@edef
as far as I can.
On the brace business, it's all doable but it's a question of interfaces. I'd have to provide a \text_uppercase_non_recursive:n
or have some switch. It's mainly a question of effort. Perhaps one for a mail to the team? We are thinking of changing \MakeUppercase
itself, or at least \MakeTextUppercase
, so input is really useful. (@davidcarlisle might have thoughts here.)
\uppercase
isn't usable anyway on non ascii text, but if you are not ready to switch to expl3 yet \MkeUppercase
and \MakeTextUppercase
both seem to work fine on
\documentclass{article}
\usepackage{textcomp}
\begin{document}
\MakeUppercase{abc №}
\end{document}
Not sure what you mean by brace protection here?
@davidcarlisle The starting point here is \MakeCaptial
, which is basically \MakeUppercase #1
. Brace protection is the BibTeX-like form of 'escaping' from case changing.
Just so it doesn't get lost.
Regarding https://github.com/plk/biblatex/issues/960#issuecomment-575914507 I think that many people will rely on brace protection in their existing bib databases. Most importantly in cases where English titles get sentence-cased and people protect words that must not be lower-cased (such as [language] names).
So in order to not break documents I'd very strongly opt for maintaining brace protection. In the long run, I'd argue in favor of implementing for some sane semantic markup for case protection (as I did in another ticket).
Originally posted by @jspitz in https://github.com/plk/biblatex/issues/941#issuecomment-575985644
@josephwright oh that . If we could revise history to make that (and {\'e}
markup) go away it would be a good thing. But I suppose there are too many existing bib files.....
I wouldn't integrate that into the main tex-level case changer, but we could have a top level prepass function that converts
{abc {Keep Mixed} zzz}
to {abc \NoCaseChange{Keep Mixed} zzz}
then \MakeTextUppercase
would do the right thing.
you could write it in expl3 (or 2e, if needed), but alternatively couldn't biber have an option to do that while extracting the fields from the bib file? It really is a bibtex syntax rather than TeX one so handling it at the bib file parse level would seem reasonable to me.
@plk Could Biber convert
title = {Text {Protect} Text},
to
title = {Text \NoCaseChange{Protect} Text},
I was actually hoping to be able to use the expl3
change as an incentive to make have users move away from {...}
for case protection to something more reasonable like \NoCaseChange
. As in: We'll make expl3
case changing optional, it won't support brace protection, but will work properly for all the other stuff that biblatex
's own case changer currently can't handle. But @jspitz's comment makes me think people won't be willing to accept that, so we may have to look into brace protection.
The point is that most users won't read your release notes, and I suppose many will not even notice the case (mis-)change in their documents caused by such a change. So unless you want to make expl3
casing opt-in (which would defeat the whole exercise IMHO) something that deals with brace protection is a must, unless you want to upset users. A biber-level way seems fine to me (not sure about the status of the bibtex[8] engine nowadays.
Oh yes I forgot that, if Biber has to do the case protection conversion, then BibTeX is an issue.
@moewew I was thinking that bibtex was OK as it already detects this use, but I suppose for biblatex you really need something like \NoCaseChange
to be inserted so you can handle the text later, and bibtex simply detecting the braces and skipping case changing is not enough... It could be be written in tex, shame though as it's probably only a line of perl:-)
Yes, we can do this in biber
but it's always a little messy with edge cases as it has to be done with regexps and people can (and do) put arbitrary TeX into datasource fields which makes simple brace protection non-trivial to detect.
No, biblatex
doesn't uses BibTeX's case changing function. The backend passes the text through unchanged and case changing happens on the LaTeX side. (I guess because the idea is that all formatting happens on the LaTeX side.)
something not unlike this may be enough to do it in TeX, (only tested on this one string)
\documentclass{article}
\usepackage{textcomp}
\usepackage[overload]{textcase}
\begin{document}
{abc {Keep Mixed} zzz \textbf{but upper this}}
\MakeUppercase{abc {Keep Mixed} zzz \textbf{but upper this}}
\def\zz#1{\MakeUppercase{\expandafter\zzz\space #1\endzzz. {}}}
\def\zzz#1 #{ #1\zzz\NoCaseChange}
\def\endzzz#1#2#3{}
\zz{abc {Keep Mixed} zzz \textbf{but upper this}}
\end{document}
I suppose it should ideally also handle nested braces:
\zz{abc {Keep Mixed} zzz \textbf{but upper this {not this}}}
@jspitz for expl3 case changer (which is stepping through character by character anyway) that may be possible, for \Make(Text)Uppercase it really doesn't make sense to add hundreds of lines of fragile tex code to add this on top of the existing code which is just a thin wrapper around \uppercase, so I think for a non expl3 setting that's as much as is reasonable.
(I don't actually know exactly what criterion bibtex uses to classify braces (I could check the sources) but as the exact rules force accents to be added as {\'e}
not \'{e}
and so break kerning and ligatures, the temptation not to follow them exactly might be strong...
@davidcarlisle I'd say the goal should be to provide, as much as possible, a way that doesn't change the output for users who use brace protection when expl3
casing is introduced. Biblatex's logic (which is not consistent) does already differ from BibTeX's here, but this is documented in the manual (sec. 4.6.4, \MakeSentenceCase
).
@jspitz sure that is a good aim, but bibtex's rules are very weird and for example the example you posted earlier
{abc {Keep Mixed} zzz \textbf{but upper this {not this}}}
bibtex would skip the entire \textbf
argument, as it skips all brace groups unless the content of the group starts with a \abc
csname. there really is no good place that one can insert \NoCaseChange
to emulate that behaviour.
We can follow whatever logic we want :) Sounds like what we want is a 'wrapper': \text_bibtex_to_expl:n
or some such. If I match the Biber behaviour in an expandable function that could be used
\text_titlecase:n { \text_bibtex_to_expl:n {#1} }
would that 'work'?
@jspitz sure that is a good aim, but bibtex's rules are very weird and for example the example you posted earlier
{abc {Keep Mixed} zzz \textbf{but upper this {not this}}}
bibtex would skip the entire
\textbf
argument, as it skips all brace groups unless the content of the group starts with a\abc
csname.
Frankly, I have not tested this with Biblatex. For Biblatex, {Abc {Keep Mixed} Zzz \textbf{but Lower This}}
the argument of \textbf
would not be sentence-cased unless you do {Abc {Keep Mixed} Zzz {\textbf{but Lower This}}}
@davidcarlisle It's not 100s of lines ;)
I was actually hoping that the switch to expl3
would be a good pretext to drop some of the weird biblatex
case protection behaviour for an overall more sane approach. I appreciate that that has backwards compatibility implications, but I was hoping to keep the main features the same and maybe drop some of the more obscure rules. People who rely on the more obscure stuff are hopefully happier to read release notes and accept that some change might be useful.
@moewew A compatibility function of the type I've suggested would be opt-in; quite easy to arrange that older behaviour is deprecated.
IMO some of the weirder behaviors could be dropped indeed, as this has always been shaky (and even documented as such). The main protection (grouping via brace or macro and the undoing of macro grouping via outer macro) should probably be emulated.
One option here is to extend biber's data annotation feature so that ranges of characters can be semantically tagged, something like:
TITLE = {Some title with a protected part}
TITLE+an:protected - {r=19-27}
This could generally be used to get rid of markup in data. It would require macros in biblatex to apply some formatting to a character range in a string (I assume expl3
has such fancy things ...). I am not convinced this is very useful but it's something that has occurred to me from time to time.
Proof of principle:
\cs_new:Npn \text_bibtex_to_expl:n #1
{
\__text_bibtex_loop:w #1
\q_recursion_tail \q_recursion_stop
}
\cs_new:Npn \__text_bibtex_loop:w #1 \q_recursion_stop
{
\tl_if_head_is_N_type:nTF {#1}
{ \__text_bibtex_N_type:N }
{
\tl_if_head_is_group:nTF {#1}
{ \__text_bibtex_group:n }
{ \__text_bibtex_space:w }
}
#1 \q_recursion_stop
}
\cs_new:Npn \__text_bibtex_N_type:N #1
{
\quark_if_recursion_tail_stop:N #1
\exp_not:n {#1}
\__text_bibtex_loop:w
}
\cs_new:Npn \__text_bibtex_group:n #1
{
{
\bool_lazy_and:nnTF
{ \tl_if_head_is_N_type_p:n {#1} }
{
\exp_after:wN \token_if_cs_p:N \exp_after:wN { \tl_head:w #1 \q_stop }
}
{ \exp_not:n {#1} }
{ \exp_not:n { \NoCaseChange {#1} } }
}
\__text_bibtex_loop:w
}
\exp_last_unbraced:NNo \cs_new:Npn \__text_bibtex_space:w \c_space_tl
{
\c_space_tl
\__text_bibtex_loop:w
}
@josephwright ah OK you add \NoCasChange
inside the braces, that's simpler and better than I had in mind. I was not seeing a good place to add \NoCaseChange
in the macro argument case as I was thinking of changing {abc}
to \NoCaseChange{abc}
ie putting it before the brace....
so
\let\test\text_bibtex_to_expl:n
\ExplSyntaxOff
\typeout{\test{abc {abc} abc}}
\typeout{\test{abc \textit{abc} abc}}
\typeout{\test{abc {\itshape abc} abc}}
gives
abc {\NoCaseChange {abc}} abc
abc \textit {\NoCaseChange {abc}} abc
abc {\itshape abc} abc
which looks good to me
Yes, the result looks good indeed. As to the macro, I wonder whether \NoCaseChange
is a good choice (due to the name clash with textcase
). \KeepCase
maybe (or, shorter to write, \KpCase
)?
And while talking about the UI: It would be nice to have a global solution to case-protect lexical items. Something like
% language-specific
\AddtoCaseProtection[english]{English,German,French,...}
% general
\AddtoCaseProtection{USA,APA,Knuth,...}
which could be used in the document preamble, or *.lbx
, or *.bbx
(of styles that use \MakeSentenceCase
)
I understand that something along this line has also been considered on the latex3 level in the long run, but still, an interface on biblatex
level would be very comfortable.
On the naming, it would be best to pick something that is clear and descriptive. We are talking about a marker to go into BibTeX files and presumably to be shared with other tools. There's no real issue in calling it \NoCaseChange
as nothing is baked into expl3
here, and the textcase
definitions a simple no-op. So one can use the same command name for both approaches.
\documentclass{article}
\usepackage{expl3}
\usepackage{textcase}
\ExplSyntaxOn
\tl_put_right:Nn \l_text_expand_exclude_tl { \NoCaseChange }
\tl_put_right:Nn \l_text_case_exclude_arg_tl { \NoCaseChange }
\ExplSyntaxOff
\begin{document}
\MakeTextLowercase{\NoCaseChange{iPhone} iPhone}
\ExplSyntaxOn
\text_lowercase:n { \NoCaseChange { iPhone } ~ iPhone }
\ExplSyntaxOff
\end{document}
My cons on \NoCaseChange
, apart from the textcase usage, would be that it is rather hard to type (given that case protection have to might be used a lot) and that it does not really fit the biblatex style of command naming. But this is just a minor note. I will be happy with whatever you come up with, as this is definitely a huge improvement over the status quo.
@jspitz Like I said, the point here is there is nothing 'baked in' to expl3
. So you can pick whatever name you feel is best: it could be something very short, though I'm not sure that's a great plan. The expl3
mechanism works with whatever commands it's 'fed'. Obvious short-ish but descriptive name is \FixedCase
.
Is the intention here to have this macro in the bibtex
source data? If so, since it is enforcing a user change, at least for biber
, I would prefer dedicated data annotations which pick out specific words or character ranges for protection, leaving no macros in the data. This also allows another level of abstraction for selecting different macros in the style.
@josephwright my comment was probably more addressed to @moewew (or whoever is going to implement this to biblatex). @plk yes that's the intention. And I agree that macros in the data should be avoided (see my proposal in https://github.com/plk/biblatex/issues/960#issuecomment-578476374), but I suppose it cannot be avoided completely.
@plk As already noted, it's quite possible to take the existing BibTeX brace approach and massage that into something with more reasonable mark-up.
I can suggest code to pick out words/phrases and protect them: single words are relatively easy, picking out a word in a phrase slightly less so (but still doable). We might have something like
\text_titlecase:n
{ \text_autocase:nn { <language> } { <input> } }
where the 'autocase' stuff could be a biblatex
-specific macro, as more generally explicit markup is safer.
Actually I would prefer to keep Biber out of this as far as possible. I only asked about the feasibility of Biber doing something here because David brought it up.
I bring it up only because the data annotation feature was partly designed to keep formatting macros out of bibtex
data.
I'd agree with that avoiding explicitly tex markup in bib files is a good aim. although annotating the field via character counts seems a bit fragile as an editing experience. Perhaps the classic bibtex brace markup is a reasonable compromise in the end. Its main downside, introducing markup like {\'e}
for accented letters which then disturb interletter kerns, isn't really an issue these days as you can tell users to avoid that and use é or even \'e instead.
An explicit mark-up is always going to be more robust than an implicit one like the brace approach: we went for explicit mark-up in \text_titlecase:n
for a reason. But I see that BibTeX files are somewhat tricky as they are not 'just LaTeX' sources. So we probably need to support the brace approach, as suggested above. (The issue with braces is that \foo{bar}
might be a macro that takes an argument, or it might be a letter-like command that is followed by case-fixed input, and we can't be sure.)
Looks like what we want is
We've got the first two, I think the last is doable, that addresses the use cases and makes people reasonably happy, no?
@josephwright that's why in the sketch 2e code above I just checked for a brace group preceded by a space or start of string, it's not fully compatible with legacy bibtex markup but it does mean that \textit{zzz}
braces don't get picked up. the vast majority of cases where you want to preserve case are surely whole words.
I'm all for a macro instead of just braces - one of the main problems for the TeX->UTF-8 conversion that biber
does on every datasource is trying to distinguish braces that need leaving in for protection from braces delimiting macro arguments. Things that make this easier or eliminate all markup from data, I am in favour of.
@plk I think we are all in agreement and expl3
here provides a good way forward (as it's supposed to); will want to test carefully but should be a good use of the new(ish) code
I've run a few tests with @josephwright's code from https://github.com/plk/biblatex/issues/960#issuecomment-578435859 and I am really impressed by and happy with the result so far.
I'm hoping to switch to the expl3
case changing code for the next release. The question is whether we should offer the old code as a opt-in for backwards compatibility (for people who don't have a current expl3
or who relied on one of the really obscure behaviour of the old code that is not present in the expl3
code) or should drop it entirely.
For now I want to implement Joseph's BibTeX-to-LaTeX braces mapping, but give people an option to opt out if they prefer a more structured approach that doesn't use braces but a dedicated no-case macro.
\NoCaseChange
because it does what it says on the tin and is familiar from the textcase
package. As far as I can see we can avoid clashes with textcase
easily (at least as long as textcase
doesn't change its definition of \NoCaseChange
without us realising it). But if there is a suggestion for a better name I'll be all ears.\NoCaseChange
to \l_text_expand_exclude_tl
and \l_text_case_exclude_arg_tl
only for 'us'? Should we?biblatex
prefix for the expl3
function names?
\MakeCapital
doesn't work with strings started with the symbol № (with pdflatex). The error is:\textnumero
works fine.