Open joaopalmeiro opened 2 months ago
I think we need the PEP to be clearly updated in order to move forward here, but this makes sense to me!
Thanks for the feedback, @di!
Btw, do you mean PEP 621 and the authors/maintainers
section? Anything I can do to help?
The standard for these fields have been created/updated over several PEPs, see https://packaging.python.org/en/latest/specifications/core-metadata/#history. PEP 621 only concerns itself with the pyproject.toml
format and relies on the previously defined PEPs for the requirements for these specific fields.
I'm not actually sure what the right path forward would be here, I think this is probably too small to be it's own PEP, but also the discussion at https://discuss.python.org/t/core-metadata-email-fields-unicode/7421/9 seems to be unresolved as well. Helping come to a resolution in that thread would probably be a good first step.
In the meantime, I'm going to mark this issue as blocked until there's an agreed-upon path forward here!
Hi! 👋
What's the problem this feature will solve?
By following the
pyproject.toml
specification and using build backends such as PDM-Backend (which makes use of the pyproject-metadata package), an author with a name with non-ASCII characters (e.g., João Palmeiro) and an email address is outputted as=?utf-8?q?Jo=C3=A3o_Palmeiro?= <joaopalmeiro@proton.me>
for theAuthor-email
core metadata field.The
Author-email
core metadata field is used to populate theAuthor
sidebar field for a package page.The pyproject-metadata package leverages the
email.utils.formataddr()
function to process the values ​​of theauthors
field of thepyproject.toml
file. This function encodes names following RFC 2047 if they have non-ASCII characters (the defaultcharset
isutf-8
) and it is this value (e.g.,=?utf-8?q?Jo=C3=A3o_Palmeiro?=
) that is written to metadata files likePKG-INFO
:As a concrete example, check the FastAPI package, please:
Instead of Sebastián RamÃrez, the author's name appears as
=?utf-8?q?Sebasti=C3=A1n_Ram=C3=ADrez?=
.In my opinion, given that the specification talks about RFC-822 and using the
email.utils.formataddr()
function or the pyproject-metadata package in build backends (current or future ones) are valid approaches, I believe Warehouse/PyPI should decode RFC 2047-encoded author names. In this way, the authors names can be displayed as expected in theAuthor
sidebar field independently, that is, with the characters used in thepyproject.toml
file.Describe the solution you'd like
Instead of
=?utf-8?q?Jo=C3=A3o_Palmeiro?=
, I would like to seeJoão Palmeiro
in theAuthor
sidebar field on a package page regardless of the build backend used (given that this is not an issue when using Hatchling, for example).So, I propose the following changes (or similar ones) to the
format_email
filter and its unit test:Let me know what you think and if I can open a PR. Thanks!
Additional context
References:
Related issues/discussions: