Open domdfcoding opened 3 years ago
I could be wrong, but I wonder if this would be on PyPI or on the tool you used for uploading (twine, or poetry, or...). I'm trying to investigate, but I can't say for sure as of now.
Hm, the definition of the metadata value seems in PEP-0345 et. al. seems to indicate that this should be supported. I can't find the PEP that defines the upload format but I think you're right.
I've tried looking at what it would mean on the code side, I should have known, really, but the author/author-email situation is a mess and the hole thing is probably a can of worms :D
I can make it so that the base case of PEP 621 is handled, but there's quite a few examples for which I have no idea what should be returned.
Author = A B, Author-Email = C D <e@f.gh>
Author = A B <a@b.cd>, C D <c@d.ef>, No Author-Email
Author = A B <a@b.cd>, Author-Email = E F <e@f.gh>
This is what we do today:
{% if release.author_email %}
<p><strong>{% trans %}Author:{% endtrans %}</strong> <a href="mailto:{{ release.author_email }}">{{ release.author or release.author_email }}</a></p>
{% elif release.author %}
<p><strong>{% trans %}Author:{% endtrans %}</strong> {{ release.author }}</p>
{% endif %}
This is what the PEP says:
Author (optional): A string containing the author's name at a minimum; additional contact information may be provided.
Example:
Author: C. Schultz, Universal Features Syndicate,
Los Angeles, CA <cschultz@peanuts.example.com>
Author-email (optional): A string containing the author's e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.
Example:
Author-email: "C. Schultz" <cschultz@example.com>
It's hinted here and there that multiple comma-separated authors are fine in the Author field.
This really makes me want to no try and parse anything smarter than the bare bare minimum :D
I think the PEP is wrong. If both Author
and Author-Email
are provided, it's much simpler to just keep them as two fields, otherwise our existing logic needs to become a lot more complex:
@brettcannon I think you might have written this? Any thoughts here?
Rereading the whole thing I guess we could do the following:
RFC-822 From: header
RFC-822 From: header
, concatenate the Author field and the "name" part of the Author-Email fieldWould that work ?
It would probably work, but why should we be mangling two separate fields into one just to have to un-mangle it somewhere else? I don't see any advantage to it, and think it would be simpler to just change the PEP and the few tools (single tool?) that have already implemented it instead.
Ah, but then we'll never be able to have proper mailto:
links ? I think I haven't understand what you'd want to do.
I'm not sure I follow, we have proper mailto:
links now for non-PEP 521 metadata.
@brettcannon I think you might have written this? Any thoughts here?
If you mean what's in the metadata spec, that's how it's always been, i.e. I didn't do it 😉 . PEP 621 just went with what was there and purposefully didn't touch the metadata spec (I tried to clean it up and got push-back from trying to do too much).
As for why PEP 621 uses Author-Email
to its fullest extent based on the spec definition, I believe it was to avoid having to try and correlate Author
and Author-Email
when they were comma-separate fields since the data is inherently tied together.
I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email
. It says RFC-822 From: header
, so I believe only a single email address should be sent. In Author
, though, multiple values can be sent, it's free-form.
I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email.
It does. From https://packaging.python.org/specifications/core-metadata/#author-email:
A string containing the author’s e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.
Example:
Author-email: "C. Schultz" <cschultz@example.com>
Per RFC-822, this field may contain multiple comma-separated e-mail addresses:
Author-email: cschultz@example.com, snoopy@peanuts.com
So my reading of "a string containing an author's emails address ... can contain a name and e-mail address" combined with "this field may contain multiple comma-separate e-mail addresses" is what led me to do what I did for PEP 621.
To be clear, I personally don't care if a change is made in regards to this; I'm not trying to specifically defend how PEP 621 does things as how things should continue to be done; I'm just trying to explain the logic of how it ended up the way it did. But it seems any change will require an update to the metadata spec and PEP 621 if you want to restrict what's valid for the author- and maintainer-related metadata fields.
There is already one moment during release submission where we have assigned a variable containing the "name" part of the multi-email RFC-822 encoded string. So without a lot of additional complexity, just assigning this to the "Author" field of the release in case it's not already filled would probably be enough.
Any update on this?
hi all - just a note that i'm having this issue too with test pipy for my package stravalib and i also see the same issue with sourmash on pypi. i don't think its the build back end in this case.
my META from my wheel thanks to @pradyunsg for telling me how to check this is:
Maintainer: Jonatan Smoocha, Yihong
Maintainer-email: Leah Wasser <leah@pyopensci.org>, Hans Lellelid <hans@xmpl.org>
and on pypi i see
it seems like it's being parsed incorrect by pypi ?? many thanks for your work on pypi btw!
FWIW, the TOML in pyproject.toml relevant to the above was (built with setuptools):
maintainers = [
{name = "Leah Wasser", email = "leah@pyopensci.org"},
{name = "Hans Lellelid", email = "hans@xmpl.org"},
{name = "Jonatan Smoocha"},
{name = "Yihong"},
]
oh yes - i'll reference this issue in my pr as well. for now i've removed emails.
So do we think the conversion from TOML -> metadata wrong, or is PyPI's interpretation of the metadata wrong? What were you expecting to happen here?
Using the data to fill in core metadata is as follows:
If only name is provided, the value goes in Author or Maintainer as appropriate.
If only email is provided, the value goes in Author-email or Maintainer-email as appropriate.
If both email and name are provided, the value goes in Author-email or Maintainer-email as appropriate, with the format {name} <{email}>.
Multiple values should be separated by commas.
I think it's on PyPI's end -- wherein it's only presenting the Maintainer
key with Maintainer-Email
as the link, even if the latter contains names and doesn't match the Maintainer
key.
I think the pyproject.toml's author
/maintainer
-> METADATA mapping (as it stands) operates on the assumption that both the "{type}" and "{type}-email" would be used/presented; whereas PyPI tries to present only one entry (Author / Maintainer) and tries to use the "{type}-email" as a link for "{type}" if they're both present.
What were you expecting to happen here?
That's an excellent question -- I'd like to ask @lwasser to provide her thoughts on this. How would you have expect PyPI to present the information you added to pyproject.toml
? :)
maintainers = [
{name = "Leah Wasser", email = "leah@pyopensci.org"},
{name = "Hans Lellelid", email = "hans@xmpl.org"},
{name = "Jonatan Smoocha"},
{name = "Yihong"},
]
One approach that I can think of is to not provide a single link to write an email to all authors/maintainers, and to instead split the keys on ,
and present them names individually (with those that have emails being linked to, on a per-person basis). For backwards-compat, we could keep the current linking behaviour (of Author
w/ Author-Email
as mailto:
) if there's a single email with no name and a single name.
Given that maintainers rarely follow that guidance 😉, I think we still need to maintain some backwards compatibility with the expectation that Author
/Maintainer
is a string, Author-Email
and Maintainer-Email
is an email, and together they become a link.
Hence the suggestion of keeping the current behaviour when there's only one email + one name. 😉
absolutely @di @pradyunsg My understanding of how this works is that (id expect authors to operate the same!)
in my table here:
maintainers = [
{name = "Name One", email = "nameone@email.org"},
{name = "Name Two", email = "nametwo@email.org"},
{name = "Name Three"},
{name = "Name Four"},
]
i'm specifying 4 maintainers. Thus on pypi, it would render as follows
<a href="mailto:nameone@email.org">Name One</a>, <a href="mailto:nametwo@email.org">Name Two</a>, Name Three, Name Four
But instead it seems to do this:
<a href="mailto:name one <nameone@email.org>, name one <nameone@email.org>">Name Three</a>, <a href="name one <nameone@email.org>, name one <nameone@email.org>">Name Four</a>
I guess i would expect it to
Hence the suggestion of keeping the current behaviour when there's only one email + one name. 😉
Sorry, missed this in the edit I think. So what should happen with:
Author: Google, Inc.
Author-email: something@google.com
I don't think that suggestion maps well onto maintaining existing behavior.
So what should happen with:
Author: Google, Inc. Author-email: something@google.com
Just fyi if those are in the same entry/table then that wouldn't occur per PEP 621 https://github.com/pypa/packaging.python.org/issues/1134#issuecomment-1231564237
If you are parsing 2 entries represented like this (i'm using setuptools to bld):
maintainers = [
{name= "Google human"},
{email = "another-human@email.com"},
]
you get this (2 unique humans are maintainers:
Maintainer: Google human
Maintainer-email: another-human@email.com
if you do this:
maintainers = [
{name = "Google human", email = "google-human@email.com"},
{email = "another-human@email.com"},
]
you get this:
Maintainer-email: Google human <google-human@email.com>, another-human@email.com
maintainers = [
{name= "Google human", email = "google-human@email.com"},
{name = "Hans Lellelid", email = "test@test.org"},
{name = "Human three"},
{email = "another-human@email.com"},
]
Results in this:
Maintainer: Human three
Maintainer-email: Google human <google-human@email.com>, Hans Lellelid <test@test.org>, another-human@email.com
I suspect two things are happening:
If you have
mailto:
link. Here i'd expect pypi to parse each name as a unique name and each email associated in htat element in the list of maintainers to be associated with the unique name. <p><strong>Maintainer:</strong> <a href="mailto:Luiz Irber <luiz@sourmash.bio>, "C. Titus Brown" <titus@idyll.org>">Luiz Irber <luiz@sourmash.bio>, "C. Titus Brown" <titus@idyll.org></a></p>
maintainers = [
{name = "Leah Wasser", email = "testemail@testemail.org"},
{name = "Hans Lellelid", email = "hans@test.org"},
{name = "Jonatan Samoocha"},
{name = "Yihong"},
]
You end up with a pypi entry like this: Notice - that. here two of the maintainers are not listed. and BOTH have an email link that is a mixture of email and maintainer names similar to what you see with sourmash. i just fixed this by removing emails altogether and now test pypi just lists all 4 of our names.
I hope that is helpful. it just seems to me that things are being parsed differently depending on what combination of information is provided.
Coming from Issue #12877 (sorry for the duplicate Issue):
We (pyhf
) are seeing a similar problem with our authors
and maintainers
fields in our PEP 621 compliant pyproject.toml
.
$ python -m pip download --index-url https://test.pypi.org/simple/ --no-deps 'pyhf==0.7.1.dev35'
$ unzip pyhf-0.7.1.dev35-py3-none-any.whl
$ head -n 12 pyhf-0.7.1.dev35.dist-info/METADATA
Metadata-Version: 2.1
Name: pyhf
Version: 0.7.1.dev35
Summary: pure-Python HistFactory implementation with tensors and autodiff
Project-URL: Documentation, https://pyhf.readthedocs.io/
Project-URL: Homepage, https://github.com/scikit-hep/pyhf
Project-URL: Issue Tracker, https://github.com/scikit-hep/pyhf/issues
Project-URL: Release Notes, https://pyhf.readthedocs.io/en/stable/release-notes.html
Project-URL: Source Code, https://github.com/scikit-hep/pyhf
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
Maintainer-email: The Scikit-HEP admins <scikit-hep-admins@googlegroups.com>
License: Apache-2.0
Our authors
field is
authors = [
{ name = "Lukas Heinrich", email = "lukas.heinrich@cern.ch" },
{ name = "Matthew Feickert", email = "matthew.feickert@cern.ch" },
{ name = "Giordon Stark", email = "gstark@cern.ch" },
]
and pip
is recognizing all the metadata as we would expect
$ python -m pip show pyhf
Name: pyhf
Version: 0.7.1.dev43
Summary: pure-Python HistFactory implementation with tensors and autodiff
Home-page:
Author:
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
License: Apache-2.0
Location: /home/feickert/.pyenv/versions/3.10.6/envs/pyhf-dev-CPU/lib/python3.10/site-packages
Requires: click, jsonpatch, jsonschema, numpy, pyyaml, scipy, tqdm
Required-by:
though for our render check upload to TestPyPI we noticed that TestPyPI is displaying only the first author and linking their email
with the generated HTML of
<p><strong>Author:</strong> <a href="mailto:lukas.heinrich@cern.ch">Lukas Heinrich</a></p>
Have all of the authors have their name and emails be listed in a comma separated list according to the order they appear in the wheel metadata
$ grep "Author-email" pyhf-0.7.1.dev35.dist-info/METADATA
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
with generated html of
<p><strong>Author:</strong> <a href="mailto:lukas.heinrich@cern.ch">Lukas Heinrich</a>, <a href="mailto:matthew.feickert@cern.ch">Matthew Feickert</a>, <a href="mailto:gstark@cern.ch">Giordon Stark</a></p>
Our maintainers
field is
maintainers = [ {name = "The Scikit-HEP admins", email = "scikit-hep-admins@googlegroups.com"} ]
and the TestPyPI render is
with the generated HTML of
<p><strong>Maintainer:</strong> <a href="mailto:The Scikit-HEP admins <scikit-hep-admins@googlegroups.com>">The Scikit-HEP admins <scikit-hep-admins@googlegroups.com></a></p>
Have the maintainer name match the metadata of the wheel
$ grep "Maintainer-email" pyhf-0.7.1.dev35.dist-info/METADATA
Maintainer-email: The Scikit-HEP admins <scikit-hep-admins@googlegroups.com>
and be a hyperlink to the mailto
<p><strong>Maintainer:</strong> <a href="mailto:scikit-hep-admins@googlegroups.com">The Scikit-HEP admins</a></p>
I encountered this bug today. We define four authors, where we don't have an email address for one of them. Pypi.org decided to only show one of them, specifically the author without an email address, and used the email address of a different author as the mailto:
-link 😲
It seems to me like the core metadata specification is incompatible with the degree of freedom that PEP 621 promises.
The gap between PEP 621 and the core metadata specification can be closed in two ways:
authors
field in PEP 621 (bringing it in line with the core metadata spec)Authors
and Maintainers
to core metadata to support the data model of PEP 621EDIT: Put the "possible solutions" behind an accordion
EDIT 2: I no longer think the solution above would be the best one, there are simpler solutions.
I don't have a particular solution other than I think it would be great for someone to write a PEP that made this bit of metadata better :) There was even a recent thread on discuss.python.org where someone else had a related issue.
I just realised that this GitHub issue should probably be split into multiple ones.
The original issue description from @domdfcoding, and the use case from @matthewfeickert, are about the case where only Author-email
(and/or Maintainer-email
) is supplied. There is no confusion about what name goes with what email address in that case. According to the issue reporters, Pypi.org does not handle this properly. I would think it is possible to fix this so that all listed authors or maintainers are shown, using their names as the label and falling back to displaying their email address when no name is given. This would only require changes in warehouse
.
The case where you are specifying multiple authors and mixing between Author
and Author-email
would be left unsupported and broken by design – just like today, in other words. If we wish to guide users towards the supported use case, we can add some guidance to the description in PEP 621 so that it recommends either including an email address for every author, or including no email addresses at all.
The issue of supporting a mix between email and non-email authors should be a different issue, I think. It would include the use cases reported by @lwasser and @pradyunsg, and me, (EDIT: and backwards compatibility with the existing usage brought up by @di) and would likely involve changes to the core metadata spec, PEP 621, warehouse
and the build
module.
I imagine this would take a while, so it makes sense to fix the simpler issue first and handle this more complex issue separately.
Author-email
I just realised that this GitHub issue should probably be split into multiple ones.... I imagine this would take a while, so it makes sense to fix the simpler issue first and handle this more complex issue separately.
This seems to make sense to me; my only concern at the moment is that PyPI is displaying only the first author, rather than all the authors, in the simple case of having only an Author-email
line looking like this:
Author-email: "Curt J. Sampson" <cjs@cynic.net>, Nishant Rodrigues <nishantjr@gmail.com>
This is exactly what issue #12877 is about, but that's been closed in favour of this issue. So does it make sense to re-open that one to cover just the "doesn't display all authors in Author-email
line," then? That seems like something that could be fixed without getting into many of these other issues. (And this issue doesn't seem to be coming towards any kind of resolution or fix any time soon.)
Agreed @0cjs , re-open #12877
Describe the bug PEP 621 allows project metadata to be defined in
pyproject.toml
. This uses a list of dictionaries to represent the project's authors. Each dictionary contains two keys, "name" and "email".To map these fields to core metadata, PEP 621 says:
Based on that, in my build backend whey I am generating metadata that looks like:
However, on PyPI this renders in the sidebar as:
(this example from https://pypi.org/project/tox-envlist/)
It's also wrong in the JSON API:
This causes further issues with tools using the API, such as https://pypistats.org, which leaves the author field blank:
Expected behavior
Compare this with another project created using setuptools:
where the metadata is:
and the response from the JSON API:
(this example from https://pypi.org/project/domdf-python-tools)
I would have expected warehouse to parse the
Author-email
field into the name and email address, and treat them the same as if they has been defined separately inAuthor
andAuthor-email
.To Reproduce Visible at https://pypi.org/project/tox-envlist/
See also https://pypi.org/project/flit/3.2.0/, which uses PEP 621 metadata and has the same problem but uses a different build backend.
My Platform
N/A
Additional context