Closed gotmax23 closed 1 month ago
@musicinmybrain said in the Fedora Python matrix channel:
I am not deeply committed to the emoji in the description of uv in particular, although all else equal, I default to preserving them since they are a very intentional part of upstream’s writing. music I tend to fall on the side of thinking that emoji are part of Unicode support, and (subject to font availability) failure to display them is a bug. Emoji are not the only things that use codepoints outside the BMP or combine several codepoints into one glyph, but they are the most visible to English speakers. So I tend to assume that if emoji don’t work, something in CJK languages doesn’t work either.
I just tried this on Fedora 40:
dnf5 info python3-fastapi
and got (excerpted):
: The key features are:
:
: \xe2\x80\xa2 Fast: Very high performance, on par with NodeJS and Go (thanks to Starlette
: and Pydantic). One of the fastest Python frameworks available.
Now, these are merely Unicode bullets, •
! So it looks like dnf5
is treating everything as ASCII and backslash-escaping everything that isn’t ASCII.
The above works fine with dnf
4.21.0 and with rpm -qi
.
There are also differences in localization, which kind of worked in rpm
, kind of worked differently in dnf
4, and doesn’t work at all in dnf5
. You can compare these in Fedora 40:
LC_ALL=zh_CN.utf8 rpm -qi python3-fastapi
: uses localized summary and description text and build date; does not localize its own fixed strings (field labels)LC_ALL=zh_CN.utf8 dnf info python3-fastapi
: localizes all of its own strings (dates, field labels), but does not use the localized summary and description textLC_ALL=zh_CN.utf8 dnf5 info python3-fastapi
: ignores locale completely, hex-escapes •
in the default English description(Don’t try other locales for now; I just fixed an issue with the other localized descriptions in the python-fastapi
spec file, and the updates for this are not yet in stable repositories.)
I guess this should be filed as a separate issue.
dnf5 version 5.1.17 does not use locale. Support was added to the upstream some time ago. I assume the problem was solved by this commit https://github.com/rpm-software-management/dnf5/commit/3d302e0b32174eab40176befddb26628f283757a . That is, since dnf5 version 5.2.0.
Hmm, thanks for that! It does look like this works in Fedora 41. Both the Unicode bullets from python3-fastapi
and the emoji from uv
are displayed correctly when I try dnf info
in a Fedora 41 mock chroot with --enable-network
, currently with dnf5 version 5.2.5.0
. I had no idea the “preview” version of dnf5
packaged in F39 and F40 was significantly different from the one in F41+.
It turns out, however, that setting LC_ALL
– even to en_US.UTF-8
– seems to break this.
https://github.com/rpm-software-management/dnf5/issues/1687#issuecomment-2360611644
It turns out, however, that setting LC_ALL – even to en_US.UTF-8 – seems to break this.
Is it a dnf problem? Can you please take a look at https://github.com/rpm-software-management/dnf5/issues/1687#issuecomment-2360779811
It turns out, however, that setting LC_ALL – even to en_US.UTF-8 – seems to break this.
Is it a dnf problem? Can you please take a look at #1687 (comment)
Indeed, as described in https://github.com/rpm-software-management/dnf5/issues/1687#issuecomment-2361046175, there are two things going on:
setlocale
) fails when the necessary glibc-langpack-##
package is not present, so dnf
ends up running with a default localednf
5.2.6.0, the message printed by dnf5
reports that default locale as C
; in Fedora 40, with dnf
5.1.17, the message says C.UTF-8
. In at least 5.2.6.0, that messge is hard-coded, and the discrepancy doesn’t reflect a difference in the actual locale.So I think that at least with respect to this bug, dnf
5.2.x itself as shipped in Fedora 41 may be behaving as one would expect, as when setlocale
fails dnf
doesn’t necessarily have access to information about whether the expected output encoding is UTF-8, so the backslash-escaping behavior might be considered reasonable.
(Deleted a comment that was inaccurate because I did an experiment in a mock chroot that I had forgotten to clean first.)
We can try to improve it. If the locale setting fails, then instead of the defaulting to "C"
message, we can try setting "C.UTF-8". If this succeeds the message defaulting to "C.UTF-8"
will be displayed and only if it fails then defaulting to "C"
.
I think dnf4 does it this way.
dnf5 version 5.1.17 does not use locale. Support was added to the upstream some time ago. I assume the problem was solved by this commit 3d302e0 . That is, since dnf5 version 5.2.0.
Can this be backported to the dnf5 version in Fedora 40 as well?
We can try to improve it. If the locale setting fails, then instead of the
defaulting to "C"
message, we can try setting "C.UTF-8". If this succeeds the messagedefaulting to "C.UTF-8"
will be displayed and only if it fails thendefaulting to "C"
. I think dnf4 does it this way.
If there aren’t any pitfalls I’m overlooking, this seems like a good idea and a solid improvement.
I don't think that falling to C.UTF-8 is great idea. If a user has a terminal in non-UTF-8 mode, he asks for e.g. ja_JP.EUCJP end DNF5 starts spitting out UTF-8 characters, it wil bodge his terminal. If glibc reverted a default locale from C.UTF-8 back to C, there was probably a good reason for it.
I would rather use nl_langfinfo() to retrieve current locale code set and print that in the error message. Or rather not mentioning "C" locale at all.
@ppisar
I think C.UTF-8
was created just for cases where the desired locale is not available. And we assume that a modern system uses UTF-8 as the default. https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
dnf4 is also trying a fallback to C.UTF-8
and so far this has proven to be a good solution.
Yes there is a risk that the user will have a misconfigured system. He will have a locale available that the terminal will not know. The setlocale() will pass and the terminal will break even without this fallback. Or the user will request an uninstalled/unsupported locale on a terminal without UTF-8 support and then it will break because of this fallback.
Or the user will be a root
and type rm -rf /
and the whole system will crash. I'm just considering where the line is between user-friendliness and protection from shooting yourself in the foot.
However, you could try tweaking the fallback in your PR to set the C++
locale. Maybe based on the nl_langfinfo()
you mention.
Can this be backported to the dnf5 version in Fedora 40 as well?
I'm not sure. The current plan is to backport anything related to dnf5 to Fedora 40 or lower only if there is a strong reason to do so.
The uv package's description in Fedora has emojis in it, which dnf5 does not handle correctly. See the difference between
dnf5 repoquery
anddnf5 info
's output: