pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.53k stars 950 forks source link

Odd rendering of author when using PEP 621 metadata. #9400

Open domdfcoding opened 3 years ago

domdfcoding commented 3 years ago

Describe the bug PEP 621 allows project metadata to be defined in pyproject.toml. This uses a list of dictionaries to represent the project's authors. Each dictionary contains two keys, "name" and "email".

To map these fields to core metadata, PEP 621 says:

  1. If only name is provided, the value goes in Author.
  2. If only email is provided, the value goes in Author-email.
  3. If both email and name are provided, the value goes in Author-email, with the format {name} <{email}> (with appropriate quoting, e.g. using email.headerregistry.Address).

Based on that, in my build backend whey I am generating metadata that looks like:

Metadata-Version: 2.1
Name: tox-envlist
Version: 0.3.0
Summary: Allows selection of a different tox envlist.
Author-email: Dominic Davis-Foster <dominic@davis-foster.co.uk>

However, on PyPI this renders in the sidebar as:

image

(this example from https://pypi.org/project/tox-envlist/)

It's also wrong in the JSON API:

{
   "info": {
      "author": "",
      "author_email": "Dominic Davis-Foster <dominic@davis-foster.co.uk>"
   }
}

This causes further issues with tools using the API, such as https://pypistats.org, which leaves the author field blank:

image

Expected behavior

Compare this with another project created using setuptools:

image

where the metadata is:

Metadata-Version: 2.1
Name: domdf-python-tools
Version: 2.9.0
Summary: Helpful functions for Python 🐍 🛠️
Home-page: https://github.com/domdfcoding/domdf_python_tools
Author: Dominic Davis-Foster
Author-email: dominic@davis-foster.co.uk

and the response from the JSON API:

{
   "info":{
      "author":"Dominic Davis-Foster",
      "author_email":"dominic@davis-foster.co.uk"
   }
}

(this example from https://pypi.org/project/domdf-python-tools)

I would have expected warehouse to parse the Author-email field into the name and email address, and treat them the same as if they has been defined separately in Author and Author-email.

To Reproduce Visible at https://pypi.org/project/tox-envlist/

See also https://pypi.org/project/flit/3.2.0/, which uses PEP 621 metadata and has the same problem but uses a different build backend.

My Platform

N/A

Additional context

ewjoachim commented 3 years ago

I could be wrong, but I wonder if this would be on PyPI or on the tool you used for uploading (twine, or poetry, or...). I'm trying to investigate, but I can't say for sure as of now.

ewjoachim commented 3 years ago

Hm, the definition of the metadata value seems in PEP-0345 et. al. seems to indicate that this should be supported. I can't find the PEP that defines the upload format but I think you're right.

ewjoachim commented 3 years ago

I've tried looking at what it would mean on the code side, I should have known, really, but the author/author-email situation is a mess and the hole thing is probably a can of worms :D

I can make it so that the base case of PEP 621 is handled, but there's quite a few examples for which I have no idea what should be returned.

Author = A B, Author-Email = C D <e@f.gh>
Author = A B <a@b.cd>, C D <c@d.ef>, No Author-Email
Author = A B <a@b.cd>, Author-Email = E F <e@f.gh>

This is what we do today:

  {% if release.author_email %}
    <p><strong>{% trans %}Author:{% endtrans %}</strong> <a href="mailto:{{ release.author_email }}">{{ release.author or release.author_email }}</a></p>
  {% elif release.author %}
    <p><strong>{% trans %}Author:{% endtrans %}</strong> {{ release.author }}</p>
  {% endif %}

This is what the PEP says:

Author (optional):  A string containing the author's name at a minimum; additional contact information may be provided.

Example:
Author: C. Schultz, Universal Features Syndicate,
        Los Angeles, CA <cschultz@peanuts.example.com>

Author-email (optional): A string containing the author's e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.

Example:

Author-email: "C. Schultz" <cschultz@example.com>

It's hinted here and there that multiple comma-separated authors are fine in the Author field.

This really makes me want to no try and parse anything smarter than the bare bare minimum :D

di commented 3 years ago

I think the PEP is wrong. If both Author and Author-Email are provided, it's much simpler to just keep them as two fields, otherwise our existing logic needs to become a lot more complex:

https://github.com/pypa/warehouse/blob/7fc3ce5bd7ecc93ef54c1652787fb5e7757fe6f2/warehouse/templates/includes/packaging/project-data.html#L78-L82

@brettcannon I think you might have written this? Any thoughts here?

ewjoachim commented 3 years ago

Rereading the whole thing I guess we could do the following:

Would that work ?

di commented 3 years ago

It would probably work, but why should we be mangling two separate fields into one just to have to un-mangle it somewhere else? I don't see any advantage to it, and think it would be simpler to just change the PEP and the few tools (single tool?) that have already implemented it instead.

ewjoachim commented 3 years ago

Ah, but then we'll never be able to have proper mailto: links ? I think I haven't understand what you'd want to do.

di commented 3 years ago

I'm not sure I follow, we have proper mailto: links now for non-PEP 521 metadata.

brettcannon commented 3 years ago

@brettcannon I think you might have written this? Any thoughts here?

If you mean what's in the metadata spec, that's how it's always been, i.e. I didn't do it 😉 . PEP 621 just went with what was there and purposefully didn't touch the metadata spec (I tried to clean it up and got push-back from trying to do too much).

As for why PEP 621 uses Author-Email to its fullest extent based on the spec definition, I believe it was to avoid having to try and correlate Author and Author-Email when they were comma-separate fields since the data is inherently tied together.

ewjoachim commented 3 years ago

I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email. It says RFC-822 From: header, so I believe only a single email address should be sent. In Author, though, multiple values can be sent, it's free-form.

brettcannon commented 3 years ago

I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email.

It does. From https://packaging.python.org/specifications/core-metadata/#author-email:

A string containing the author’s e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.

Example:

Author-email: "C. Schultz" <cschultz@example.com>

Per RFC-822, this field may contain multiple comma-separated e-mail addresses:

Author-email: cschultz@example.com, snoopy@peanuts.com

So my reading of "a string containing an author's emails address ... can contain a name and e-mail address" combined with "this field may contain multiple comma-separate e-mail addresses" is what led me to do what I did for PEP 621.

To be clear, I personally don't care if a change is made in regards to this; I'm not trying to specifically defend how PEP 621 does things as how things should continue to be done; I'm just trying to explain the logic of how it ended up the way it did. But it seems any change will require an update to the metadata spec and PEP 621 if you want to restrict what's valid for the author- and maintainer-related metadata fields.

ewjoachim commented 3 years ago

That was the missing piece of the puzzle to me. I was looking at the PEP text, where I should have been looking at the packaging doc. The part on multiple email addresses was added by @di 3 years ago following an update of Warehouse where corresponding processing was added.

ewjoachim commented 3 years ago

There is already one moment during release submission where we have assigned a variable containing the "name" part of the multi-email RFC-822 encoded string. So without a lot of additional complexity, just assigning this to the "Author" field of the release in case it's not already filled would probably be enough.

ofek commented 1 year ago

Any update on this?

lwasser commented 1 year ago

hi all - just a note that i'm having this issue too with test pipy for my package stravalib and i also see the same issue with sourmash on pypi. i don't think its the build back end in this case.

my META from my wheel thanks to @pradyunsg for telling me how to check this is:

Maintainer: Jonatan Smoocha, Yihong
Maintainer-email: Leah Wasser <leah@pyopensci.org>, Hans Lellelid <hans@xmpl.org>

and on pypi i see

Screen Shot 2023-01-02 at 10 55 39 AM

it seems like it's being parsed incorrect by pypi ?? many thanks for your work on pypi btw!

pradyunsg commented 1 year ago

FWIW, the TOML in pyproject.toml relevant to the above was (built with setuptools):

maintainers = [
     {name = "Leah Wasser", email = "leah@pyopensci.org"},
     {name = "Hans Lellelid", email = "hans@xmpl.org"},
     {name = "Jonatan Smoocha"},
     {name = "Yihong"},
]

x-ref https://github.com/stravalib/stravalib/pull/304

lwasser commented 1 year ago

oh yes - i'll reference this issue in my pr as well. for now i've removed emails.

di commented 1 year ago

So do we think the conversion from TOML -> metadata wrong, or is PyPI's interpretation of the metadata wrong? What were you expecting to happen here?

pradyunsg commented 1 year ago

From https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers:

Using the data to fill in core metadata is as follows:

  1. If only name is provided, the value goes in Author or Maintainer as appropriate.

  2. If only email is provided, the value goes in Author-email or Maintainer-email as appropriate.

  3. If both email and name are provided, the value goes in Author-email or Maintainer-email as appropriate, with the format {name} <{email}>.

  4. Multiple values should be separated by commas.

I think it's on PyPI's end -- wherein it's only presenting the Maintainer key with Maintainer-Email as the link, even if the latter contains names and doesn't match the Maintainer key.

I think the pyproject.toml's author/maintainer -> METADATA mapping (as it stands) operates on the assumption that both the "{type}" and "{type}-email" would be used/presented; whereas PyPI tries to present only one entry (Author / Maintainer) and tries to use the "{type}-email" as a link for "{type}" if they're both present.

pradyunsg commented 1 year ago

What were you expecting to happen here?

That's an excellent question -- I'd like to ask @lwasser to provide her thoughts on this. How would you have expect PyPI to present the information you added to pyproject.toml? :)

maintainers = [
     {name = "Leah Wasser", email = "leah@pyopensci.org"},
     {name = "Hans Lellelid", email = "hans@xmpl.org"},
     {name = "Jonatan Smoocha"},
     {name = "Yihong"},
]

One approach that I can think of is to not provide a single link to write an email to all authors/maintainers, and to instead split the keys on , and present them names individually (with those that have emails being linked to, on a per-person basis). For backwards-compat, we could keep the current linking behaviour (of Author w/ Author-Email as mailto:) if there's a single email with no name and a single name.

di commented 1 year ago

From https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers:

Given that maintainers rarely follow that guidance 😉, I think we still need to maintain some backwards compatibility with the expectation that Author/Maintainer is a string, Author-Email and Maintainer-Email is an email, and together they become a link.

pradyunsg commented 1 year ago

Hence the suggestion of keeping the current behaviour when there's only one email + one name. 😉

lwasser commented 1 year ago

absolutely @di @pradyunsg My understanding of how this works is that (id expect authors to operate the same!)

in my table here:

maintainers = [
     {name = "Name One", email = "nameone@email.org"},
     {name = "Name Two", email = "nametwo@email.org"},
     {name = "Name Three"},
     {name = "Name Four"},
]

i'm specifying 4 maintainers. Thus on pypi, it would render as follows

<a href="mailto:nameone@email.org">Name One</a>, <a href="mailto:nametwo@email.org">Name Two</a>, Name Three, Name Four

But instead it seems to do this:

<a href="mailto:name one <nameone@email.org>, name one <nameone@email.org>">Name Three</a>, <a href="name one <nameone@email.org>, name one <nameone@email.org>">Name Four</a>

I guess i would expect it to

  1. first list the maintainers in the order that they appear in the pyproject.toml and
  2. add the email link just to the items with an email?
di commented 1 year ago

Hence the suggestion of keeping the current behaviour when there's only one email + one name. 😉

Sorry, missed this in the edit I think. So what should happen with:

Author: Google, Inc.
Author-email: something@google.com

I don't think that suggestion maps well onto maintaining existing behavior.

ofek commented 1 year ago

So what should happen with:

Author: Google, Inc.
Author-email: something@google.com

Just fyi if those are in the same entry/table then that wouldn't occur per PEP 621 https://github.com/pypa/packaging.python.org/issues/1134#issuecomment-1231564237

lwasser commented 1 year ago

If you are parsing 2 entries represented like this (i'm using setuptools to bld):

maintainers = [
     {name= "Google human"},
     {email = "another-human@email.com"},
    ]

you get this (2 unique humans are maintainers:

Maintainer: Google human
Maintainer-email: another-human@email.com

if you do this:

maintainers = [
     {name = "Google human",  email = "google-human@email.com"},
     {email = "another-human@email.com"},
    ]

you get this:

Maintainer-email: Google human <google-human@email.com>, another-human@email.com

Two name + email, one name only, one email only

maintainers = [
     {name= "Google human", email = "google-human@email.com"},
     {name = "Hans Lellelid", email = "test@test.org"},
     {name = "Human three"},
     {email = "another-human@email.com"},
    ]

Results in this:

Maintainer: Human three
Maintainer-email: Google human <google-human@email.com>, Hans Lellelid <test@test.org>, another-human@email.com

I suspect two things are happening:

If you have

  1. Two maintainers with associated emails two emails (example - sour mash - the HTML output looks like this where the entire string for both maintainers is turned into a mailto: link. Here i'd expect pypi to parse each name as a unique name and each email associated in htat element in the list of maintainers to be associated with the unique name.
<p><strong>Maintainer:</strong> <a href="mailto:Luiz Irber <luiz@sourmash.bio>, &quot;C. Titus Brown&quot; <titus@idyll.org>">Luiz Irber &lt;luiz@sourmash.bio&gt;, "C. Titus Brown" &lt;titus@idyll.org&gt;</a></p>
  1. If you have multiple maintainers and some have email others don't like this:
maintainers = [
     {name = "Leah Wasser", email = "testemail@testemail.org"},
     {name = "Hans Lellelid", email = "hans@test.org"},
     {name = "Jonatan Samoocha"},
     {name = "Yihong"},
    ]

You end up with a pypi entry like this: Notice - that. here two of the maintainers are not listed. and BOTH have an email link that is a mixture of email and maintainer names similar to what you see with sourmash. i just fixed this by removing emails altogether and now test pypi just lists all 4 of our names.

test-pypi

I hope that is helpful. it just seems to me that things are being parsed differently depending on what combination of information is provided.

matthewfeickert commented 1 year ago

Coming from Issue #12877 (sorry for the duplicate Issue):


Paste of Issue 12877 content if useful for quick reference: :wave: Hi. Our project [`pyhf`](https://github.com/scikit-hep/pyhf) just switched (c.f. https://github.com/scikit-hep/pyhf/pull/2095) from having our PyPI metadata in [`setup.cfg`](https://github.com/scikit-hep/pyhf/blob/3eef1fff2e6d4ccf474bbe51e2211cf690752b82/setup.cfg) to [`pyproject.toml`](https://github.com/scikit-hep/pyhf/blob/e942d852922f22c495ae711303835a5a698f217d/pyproject.toml). In doing so, we also changed from having our author metadata for the 3 authors be across [`author` and `author_email`](https://github.com/scikit-hep/pyhf/blob/3eef1fff2e6d4ccf474bbe51e2211cf690752b82/setup.cfg#L7-L8) to having it be contained in [`authors`](https://github.com/scikit-hep/pyhf/blob/e942d852922f22c495ae711303835a5a698f217d/pyproject.toml#L12-L16) following [PEP 621's requirements](https://peps.python.org/pep-0621/#authors-maintainers) of > These fields accept an array of tables with 2 keys: name and email. Both values must be strings. The name value MUST be a valid email name (i.e. whatever can be put as a name, before an email, in [RFC 822](https://datatracker.ietf.org/doc/html/rfc822.html)) and not contain commas. The email value MUST be a valid email address. Both keys are optional. `pip` is recognizing all the metadata as we would expect ```console $ python -m pip show pyhf Name: pyhf Version: 0.7.1.dev43 Summary: pure-Python HistFactory implementation with tensors and autodiff Home-page: Author: Author-email: Lukas Heinrich , Matthew Feickert , Giordon Stark License: Apache-2.0 Location: /home/feickert/.pyenv/versions/3.10.6/envs/pyhf-dev-CPU/lib/python3.10/site-packages Requires: click, jsonpatch, jsonschema, numpy, pyyaml, scipy, tqdm Required-by: ``` However, when we [published this to TestPyPI](https://test.pypi.org/project/pyhf/0.7.1.dev35/) to check how things looked after switching over we noticed that TestPyPI is displaying only the first author and linking their email [![testPyPI](https://user-images.githubusercontent.com/5142394/213782977-8b308852-5f24-4d9f-a946-2fb00af5c301.png)](https://test.pypi.org/project/pyhf/0.7.1.dev35/) Previously when we shoved all our names and emails into `author` and `author_email` we could at least have [all our names be displayed](https://pypi.org/project/pyhf/0.7.0/) (no surprise there as we were abusing the field) [![PyPI-0 7 0](https://user-images.githubusercontent.com/5142394/213784164-d3cfc0af-f5d7-4037-a88c-2e8fcda37aac.png)](https://pypi.org/project/pyhf/0.7.0/) I assume that this behavior with `authors` is because `warehouse` uses only the core metadata here (?) following PEP 621's instructions of: > Using the data to fill in [core metadata](https://packaging.python.org/specifications/core-metadata/) is as follows: > > 1. If only name is provided, the value goes in Author/Maintainer as appropriate. > 2. If only email is provided, the value goes in Author-email/Maintainer-email as appropriate. > 3. If both email and name are provided, the value goes in Author-email/Maintainer-email as appropriate, with the format {name} <{email}> (with appropriate quoting, e.g. using email.headerregistry.Address). > 4. Multiple values should be separated by commas. Would it be possible for `warehouse` to display all authors information if it exists? Or is that something that is outside the scope of how `warehouse` interacts with metadata? **Describe the solution you'd like** Have `warehouse` be able to parse the existence of PEP 621 `authors` and display all names and associated emails of `authors` on the package webpage.

We (pyhf) are seeing a similar problem with our authors and maintainers fields in our PEP 621 compliant pyproject.toml.

Metadata from relevant wheel

$ python -m pip download --index-url https://test.pypi.org/simple/ --no-deps 'pyhf==0.7.1.dev35'
$ unzip pyhf-0.7.1.dev35-py3-none-any.whl
$ head -n 12 pyhf-0.7.1.dev35.dist-info/METADATA
Metadata-Version: 2.1
Name: pyhf
Version: 0.7.1.dev35
Summary: pure-Python HistFactory implementation with tensors and autodiff
Project-URL: Documentation, https://pyhf.readthedocs.io/
Project-URL: Homepage, https://github.com/scikit-hep/pyhf
Project-URL: Issue Tracker, https://github.com/scikit-hep/pyhf/issues
Project-URL: Release Notes, https://pyhf.readthedocs.io/en/stable/release-notes.html
Project-URL: Source Code, https://github.com/scikit-hep/pyhf
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
Maintainer-email: The Scikit-HEP admins <scikit-hep-admins@googlegroups.com>
License: Apache-2.0

Authors

Our authors field is

authors = [
    { name = "Lukas Heinrich", email = "lukas.heinrich@cern.ch" },
    { name = "Matthew Feickert", email = "matthew.feickert@cern.ch" },
    { name = "Giordon Stark", email = "gstark@cern.ch" },
]

and pip is recognizing all the metadata as we would expect

$ python -m pip show pyhf
Name: pyhf
Version: 0.7.1.dev43
Summary: pure-Python HistFactory implementation with tensors and autodiff
Home-page: 
Author: 
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
License: Apache-2.0
Location: /home/feickert/.pyenv/versions/3.10.6/envs/pyhf-dev-CPU/lib/python3.10/site-packages
Requires: click, jsonpatch, jsonschema, numpy, pyyaml, scipy, tqdm
Required-by:

though for our render check upload to TestPyPI we noticed that TestPyPI is displaying only the first author and linking their email

testPyPI

with the generated HTML of

<p><strong>Author:</strong> <a href="mailto:lukas.heinrich@cern.ch">Lukas Heinrich</a></p>

Expectation / Desired Result

Have all of the authors have their name and emails be listed in a comma separated list according to the order they appear in the wheel metadata

$ grep "Author-email" pyhf-0.7.1.dev35.dist-info/METADATA 
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>

with generated html of

<p><strong>Author:</strong> <a href="mailto:lukas.heinrich@cern.ch">Lukas Heinrich</a>, <a href="mailto:matthew.feickert@cern.ch">Matthew Feickert</a>, <a href="mailto:gstark@cern.ch">Giordon Stark</a></p>

Maintainers

Our maintainers field is

maintainers = [ {name = "The Scikit-HEP admins", email = "scikit-hep-admins@googlegroups.com"} ]

and the TestPyPI render is

TestPyPI-maintainer

with the generated HTML of

<p><strong>Maintainer:</strong> <a href="mailto:The Scikit-HEP admins &lt;scikit-hep-admins@googlegroups.com&gt;">The Scikit-HEP admins &lt;scikit-hep-admins@googlegroups.com&gt;</a></p>

Expectation / Desired Result

Have the maintainer name match the metadata of the wheel

$ grep "Maintainer-email" pyhf-0.7.1.dev35.dist-info/METADATA 
Maintainer-email: The Scikit-HEP admins <scikit-hep-admins@googlegroups.com>

and be a hyperlink to the mailto

<p><strong>Maintainer:</strong> <a href="mailto:scikit-hep-admins@googlegroups.com">The Scikit-HEP admins</a></p>
tobinus commented 10 months ago

I encountered this bug today. We define four authors, where we don't have an email address for one of them. Pypi.org decided to only show one of them, specifically the author without an email address, and used the email address of a different author as the mailto:-link 😲

It seems to me like the core metadata specification is incompatible with the degree of freedom that PEP 621 promises.

For instance, how would you separate the following two cases? (click to expand) ```toml # A PEP 621 project [project] # ... authors = [ { name = "Alice" }, { email = "bob@example.com"}, ] ``` which would become: ``` Author: Alice Author-email: bob@example.com ``` and ```python # A "classic" project setup( # ... author="Bob Bobbity", author_email="bob@example.com", ) ``` which would become: ``` Author: Bob Bobbity Author-email: bob@example.com ``` In the first case, you would expect the name in `Author` to be listed separately from the email in `Author-email`, meanwhile you would want the name in the second case to be combined with the email in `Author-email`. But there is no way to tell the two cases apart based on the core metadata alone.

The gap between PEP 621 and the core metadata specification can be closed in two ways:

Some thoughts on how you could add new fields Authors and Maintainers to core metadata to support the data model of PEP 621 EDIT 2: I no longer think this is the best solution. ## Possible solutions If I were to come up with a "dream" solution, I would try to expand the core metadata specification with new fields, `Authors` and `Maintainers`. Note that they are plural, while the existing fields are singular. They would work exactly like `Author-email` and `Maintainer-email`, _except_ you would be permitted to specify a name with no email address by using the same form as an email address with a name, but with the email address specified as an empty string. For instance: `Alice <>`. To keep backwards compatibility with tools that don't know about the new fields, I would keep the algorithm described in [PEP 621]. However, tools that _do_ know about the new fields should always disregard the old fields (`Author` and `Author-email`, or `Maintainer` and `Maintainer-email`) if the corresponding new field is present (`Authors`, or `Maintainers`). So the information in the `authors` and `maintainers` fields of `pyproject.toml` would be repeated twice in the core metadata: Once in the new field, and once in one of the old ones.
Here's what the example in PEP 621 would look like (click to expand) The following definition in `pyproject.toml`: ```toml [project] authors = [ {name = "Pradyun Gedam", email = "pradyun@example.com"}, {name = "Tzu-Ping Chung", email = "tzu-ping@example.com"}, {name = "Another person"}, {email = "different.person@example.com"}, ] maintainers = [ {name = "Brett Cannon", email = "brett@python.org"} ] ``` would be converted to the following core metadata: ``` Authors: "Pradyun Gedam" , "Tzu-Ping Chung" , "Another person" <>, different.person@example.com Maintainers: "Brett Cannon" # For backwards compatibility Author: Another person Author-email: "Pradyun Gedam" , "Tzu-Ping Chung" , different.person@example.com Maintainer-email: "Brett Cannon" ``` I see that `Maintainers` was redundant in this example, since there is no confusion with `Maintainer-email` when everyone has an email address. So it's possible that the new field should only be used when there is at least one author/maintainer without an email address? But there's something to be said about being consistent.
---- The advantage of this approach is that you get the freedom to mix between authors with only a name, only an email, and both a name and an email address, in a way that is straight-forward to parse on the other end. The downsides of this approach are that the new fields are easy to confuse with the old ones (since there's only a trailing `s` separating the two), and that information is repeated twice in the core metadata. Alternatively, you _could_ modify the definition of `Author-email` and `Maintainer-email` so that they may accept authors/maintainers without an email address, and use them for every author and maintainer when converting from PEP 621 (leaving out `Author` and `Maintainer`). But it feels a bit silly to put authors and maintainers without an email address inside `Author-email` or `Maintainer-email`. And tools out there may crash or behave weird if they were served `Author-email: Alice <>`?

EDIT: Put the "possible solutions" behind an accordion

EDIT 2: I no longer think the solution above would be the best one, there are simpler solutions.

dstufft commented 10 months ago

I don't have a particular solution other than I think it would be great for someone to write a PEP that made this bit of metadata better :) There was even a recent thread on discuss.python.org where someone else had a related issue.

tobinus commented 10 months ago

I just realised that this GitHub issue should probably be split into multiple ones.

The original issue description from @domdfcoding, and the use case from @matthewfeickert, are about the case where only Author-email (and/or Maintainer-email) is supplied. There is no confusion about what name goes with what email address in that case. According to the issue reporters, Pypi.org does not handle this properly. I would think it is possible to fix this so that all listed authors or maintainers are shown, using their names as the label and falling back to displaying their email address when no name is given. This would only require changes in warehouse.

The case where you are specifying multiple authors and mixing between Author and Author-email would be left unsupported and broken by design – just like today, in other words. If we wish to guide users towards the supported use case, we can add some guidance to the description in PEP 621 so that it recommends either including an email address for every author, or including no email addresses at all.


The issue of supporting a mix between email and non-email authors should be a different issue, I think. It would include the use cases reported by @lwasser and @pradyunsg, and me, (EDIT: and backwards compatibility with the existing usage brought up by @di) and would likely involve changes to the core metadata spec, PEP 621, warehouse and the build module.

I imagine this would take a while, so it makes sense to fix the simpler issue first and handle this more complex issue separately.

tobinus commented 10 months ago
Solving the complex case by always including names in Author-email Preferably, those thoughts should go in a new issue which is separate from this, per [my previous comment](https://github.com/pypi/warehouse/issues/9400#issuecomment-1739567836). But I'm putting them here for the time being. Warehouse should support multiple authors, as described in [PEP 621]. This proposed solution involves a small change to the algorithm used to convert `pyproject.toml` into core metadata. Additionally, an algorithm for parsing the core metadata back into multiple authors and maintainers should be added to the [core metadata specification]. Warehouse should be updated to use this new algorithm. The following discussion of `Author` and `Author-email` also applies to `Maintainer` and `Maintainer-email`. How do we know whether the author in `Author` is the same person or a different person from the email address in `Author-email`? The idea is to ensure that the core metadata produced by the algorithm in PEP 621 can be consistently detected as such, and handled by using the same algorithm backwards (but with the caveat that the `Author` field is unstructured). The change made to PEP 621 is to _always_ include a name for email addresses in `Author-email`. If no real name was provided, the email address should be repeated as the name. So `authors = [{email = "hi@example.com"}]` should result in `Author-email: "hi@example.com" `.
The PEP 621 example The following `pyproject.toml`: ```toml [project] authors = [ {name = "Pradyun Gedam", email = "pradyun@example.com"}, {name = "Tzu-Ping Chung", email = "tzu-ping@example.com"}, {name = "Another person"}, {email = "different.person@example.com"}, ] maintainers = [ {name = "Brett Cannon", email = "brett@python.org"} ] ``` would produce the following core metadata: ``` Author: Another person Author-email: "Pradyun Gedam" , "Tzu-Ping Chung" , "different.person@example.com" Maintainer-email: "Brett Cannon" ```
---- Consumers of core metadata, such as Warehouse, should distinguish between two cases: 1. **The combined case:** When both `Author` and `Author-email` are provided, and there is only one email address in `Author-email`, and it has no name. This should be handled like today, with `Author` being used as the label and `Author-email` being used as the `mailto:` target. 2. **The separated case:** When there are multiple email addresses, or there is only one and it has a name, or only one of `Author` and `Author-email` is provided. * The value of the `Author` field, if present, should be assumed to contain information about authors that don't have any email address, but tools should not make any assumptions about its internal structure. So it should be displayed as it is written, but without linking anywhere. * The value of the `Author-email` field, if present, should be parsed into an additional list of authors where every author has an email address and a label/name. If no name was provided, the email address should be used as the label. * If both fields are present, their display should be joined with a comma and a space (`, `)
Examples of the combined case Example of an author with an email address: ``` Author: John Doe Author-email: john.doe@example.com ``` should be displayed as: > Author: [John Doe](mailto:john.doe@example.com) Example from the core metadata specification, but with an email address added: ``` Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA Author-email: cschultz@peanuts.example.com ``` should be displayed as: > Author: [C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com>](mailto:cschultz@peanuts.example.com)
----
Examples of the separated case Example with two authors, where one has only a name and another has only an email address: ``` Author: John Doe Author-email: jane.doe@example.com ``` should be displayed as: > Author: John Doe, [jane.doe@example.com](mailto:jane.doe@example.com) Example of `Author` field from the core metadata specification: ``` Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA ``` should be displayed as: > Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com> > Note: GitHub adds an automatic link above. The email address embedded in `Author` doesn't necessarily have to be a link. Example from PEP 621 (as converted to core metadata in the example above): ``` Author: Another person Author-email: "Pradyun Gedam" , "Tzu-Ping Chung" , "different.person@example.com" Maintainer-email: "Brett Cannon" ``` should be displayed as: > Author: Another person, [Pradyun Gedam](mailto:pradyun@example.com), [Tzu-Ping Chung](mailto:tzu-ping@example.com), [different.person@example.com](mailto:different.person@example.com) > Maintainer: [Brett Cannon](mailto:brett@python.org)
---- **Advantages** to this approach include: * Compatibility with packages written using the existing rules and Warehouse's current behaviour * Compatibility with multiple authors specified using the format in [PEP 621] * No changes to the fields of [PEP 621] or the [core metadata specification]. The changes are limited to clarifications of how the existing fields and their capabilities can be used to ensure unambiguous parsing by tools such as Warehouse * Users of `setup.py` may also follow the same rules to get the same effect as users of [PEP 621] **Limitations** of this approach include: * Authors may not be listed using line breaks or bullet points, since we are unable to make any assumptions about the internal format of the `Author` field. Do the commas separate different authors, or are they used to separate the author name from their street address or organisation? * We could perhaps say that commas in the `Author` field should be assumed to separate different authors in the separated case only? * Changes must be made to the `build` module, and users must update it to get consistently working results * That said, many cases will work out of the box. The only case that doesn't work with the current implementation is the one where you combine authors with only a name and a single author with only an email address. * The display logic in Warehouse must be expanded. But this is inevitable if multiple authors should be supported