python / cpython

The Python programming language
https://www.python.org/
Other
60.78k stars 29.34k forks source link

Merge Doc/ACKS.txt names into Misc/ACKS #59642

Closed cjerdonek closed 11 years ago

cjerdonek commented 11 years ago
BPO 15437
Nosy @loewis, @birkenfeld, @jcea, @pitrou, @ezio-melotti, @merwok, @bitdancer, @asvetlov, @cjerdonek
Files
  • merge-acks.py
  • issue-15437-sample-output.patch
  • issue-15437-script-output-2.patch
  • merge-acks-2.py
  • merge-acks-3.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/ezio-melotti' closed_at = created_at = labels = ['type-feature', 'docs'] title = 'Merge Doc/ACKS.txt names into Misc/ACKS' updated_at = user = 'https://github.com/cjerdonek' ``` bugs.python.org fields: ```python activity = actor = 'chris.jerdonek' assignee = 'ezio.melotti' closed = True closed_date = closer = 'ezio.melotti' components = ['Documentation'] creation = creator = 'chris.jerdonek' dependencies = [] files = ['26493', '26494', '26711', '26712', '27157'] hgrepos = [] issue_num = 15437 keywords = ['patch'] message_count = 23.0 messages = ['166247', '166248', '166249', '166251', '166260', '166261', '166281', '166291', '166294', '166295', '166296', '166298', '166321', '166328', '166411', '166420', '167588', '167589', '170134', '170425', '170462', '170465', '170466'] nosy_count = 11.0 nosy_names = ['loewis', 'georg.brandl', 'jcea', 'pitrou', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'asvetlov', 'chris.jerdonek', 'docs@python', 'python-dev'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue15437' versions = ['Python 2.7', 'Python 3.2', 'Python 3.3'] ```

    cjerdonek commented 11 years ago

    This issue is to merge the Doc/ACKS and Misc/ACKS files as discussed here:

    http://mail.python.org/pipermail/python-dev/2012-July/121096.html

    cjerdonek commented 11 years ago

    I would be happy to prepare a patch. I can upload a script to this issue that the committer can then run on the latest Misc/ACKS and Doc/ACKS.txt.

    The script would preserve the ordering of Misc/ACKS. It would iterate through the names in Doc/ACKS.txt and insert them in Misc/ACKS at the appropriate location. Duplicates would not be inserted.

    pitrou commented 11 years ago

    Georg, do you think this is ok for all 3 branches?

    merwok commented 11 years ago

    This was indeed proposed once or twice before; I can’t search my archive right now but I think I remember Georg saying that he was OK as long as the docs displayed Misc/ACKS. This means checking the rst syntax of Misc/ACKS and using the right include directive.

    cjerdonek commented 11 years ago

    Attached is a script that seems to do the job (except for the rst formatting, which can be added later. This was so that you can see by the diff what has changed).

    In the process of doing this, I found that Jeff McNeil is far out of order in Misc/ACKS, and possibly also Hugo Lopes Tavares and Xavier de Gaye, depending on what alphabetization rules should be used.

    The script contains logic to collect the non-ascii characters that appear in people's names, so that non-ascii characters can be approximated by ascii characters for ordering purposes (which seems to be how it is done now in some cases).

    In a subsequent comment, I will attach a diff that results from running the script, so you can see what effect it has on Misc/ACKS.

    cjerdonek commented 11 years ago

    Attaching sample output of running the script.

    cjerdonek commented 11 years ago

    I created a new bpo-15439 for including the combined Misc/ACKS into the documentation (as Éric mentioned) because the nature of that discussion is different, and because the changes will be easier to observe and understand if committed separately.

    bitdancer commented 11 years ago

    I'm not clear if your script is trying to do this, but there is no way to automatically alphabetize the file. That's why it says "rough" alphabetic order. The issue is that different languages alphabetize different letters in different places. We try to respect the alphabetization of the source language as much as practical...which means there is no algorithm that can do the sorting, since the names in the file do not carry explicit language information.

    pitrou commented 11 years ago

    Well, the script output looks good (apart from a few duplicates which can be resolved by hand, e.g. "Terry Reedy" vs. "Terry J. Reedy").

    cjerdonek commented 11 years ago

    I did think through those issues and made a special effort to address them in the script.

    For starters, the script does not change the order of any names in Misc/ACKS. This is to preserve the existing rough alphabetical ordering, and to ensure that the diff consists only of insertions (for easier manual checking, if desired).

    As for inserting new names in rough alphabetical order, I dealt with different language characters as follows. The script has a translation table to map non-ascii characters to ascii characters for sorting purposes. Currently, that table is as follows (I'm not sure if all of these characters will render on the page):

    NON_ASCII = "ÅÉØáäåæçéëíñóôöùúüćęŁńŽКМСабгекнорш“”" ASCII_SUB = 'AEOaaaaceeinooouuuceLnZKMCabrekhopw""'

    This mapping can easily be modified if my initial choices are not the best. As an early step, the script collects all non-ascii characters that appear in all names to make sure the translation table is up to date (exiting with a message otherwise).

    When I said "Jeff McNeil" is out of order, that was because the name appears after "Jeff Epler" but before "Tom Epperly". The script maintains a list of "out of order" names like this to skip when inserting, to prevent insertions from being out of rough alphabetical order.

    If different languages use a different ordering on the word level, the script will not handle that, however. It only orders lexicographically by last name, and then first name(s).

    Much of this information is spelled out in the script's docstring.

    cjerdonek commented 11 years ago

    That is correct, Antoine. Duplicates need to be removed by hand.

    To assist in this process, the script currently prints "possible duplicates" to stdout after running. However, the script could easily be modified to display an in-line indicator before possible duplicates to make this manual step easier, e.g.:

     John Redford
     Terry Reedy
    +>>> Terry J. Reedy
     Gareth Rees

    Currently, possible duplicates are determined based on whether the last name matches an already existing last name.

    pitrou commented 11 years ago

    To assist in this process, the script currently prints "possible duplicates" to stdout after running. However, the script could easily be modified to display an in-line indicator before possible duplicates to make this manual step easier, e.g.:

    John Redford Terry Reedy +>>> Terry J. Reedy Gareth Rees

    Well, no need to be perfectionist IMO. The merging will only be done once (thrice if we count all branches :-)).

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 11 years ago

    Also, if you want to do phonetic translation of non-ASCII, then абгекнор really matches abgeknor, and ш is transliterated to "sh" in English (IIUC) (to "sch" in German).

    But I agree that this is best done manually. What matters is what the script produces; the script certainly won't make it into Python's source code. I'm sure Chris had fun writing it.

    cjerdonek commented 11 years ago

    Yes, I did. Even though it is throw-away.

    By the way, I'm taking Antoine's advice to avoid perfectionism on this. Otherwise I'd include your suggestion re: the special characters. :)

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 11 years ago

    I don't think the docs should display Misc/ACKS. Instead, I propose the following wording

    "Many people have contributed to the Python language, the Python standard library, and the Python documentation. See Misc/ACKS in the Python source distribution for a partial list of contributors"

    It might be useful to link "Misc/ACKS" to http://hg.python.org/cpython/file/default/Misc/ACKS (http://hg.python.org/cpython/raw-file/default/Misc/ACKS would be better if hgweb wouldn't declare that application/octet-stream)

    merwok commented 11 years ago

    We can just use :source:`Misc/ACKS` and it will created a link to hgweb (the colored HTML page, not the raw file).

    cjerdonek commented 11 years ago

    Is this issue awaiting feedback from anyone else before it can proceed further? (Just this issue and not bpo-15439 to make any adjustments to the docs.)

    I am attaching an updated diff after generating the script output again against the tip (modified to prefix matching last names with '>>> ').
    cjerdonek commented 11 years ago

    For completeness, I am attaching the modified version of the script that was used to generate the latest output.

    cjerdonek commented 11 years ago

    I was reminded of this issue by the following e-mail today:

    http://mail.python.org/pipermail/python-dev/2012-September/121639.html

    I updated the script I attached earlier to ensure that it can also be run against the names in 2.7 (attaching now as script #3). I also checked that this latest script can still be run against 3.2 and default with the names that have been added since the last time I checked.

    Let me know if you would like any assistance in how to run the script and what to check for, etc.

    cjerdonek commented 11 years ago

    Just an FYI that Ezio asked Georg about this issue on IRC yesterday or the day before, and Georg said +1.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 11 years ago

    New changeset 48185b0f7b8a by Ezio Melotti in branch '3.2': bpo-15437, bpo-15439: merge Doc/ACKS.txt with Misc/ACKS and modify Doc/about.rst accordingly. http://hg.python.org/cpython/rev/48185b0f7b8a

    New changeset 2b4a89f82485 by Ezio Melotti in branch 'default': bpo-15437, bpo-15439: merge with 3.2. http://hg.python.org/cpython/rev/2b4a89f82485

    New changeset 76dd082d332e by Ezio Melotti in branch '2.7': bpo-15437, bpo-15439: merge Doc/ACKS.txt with Misc/ACKS and modify Doc/about.rst accordingly. http://hg.python.org/cpython/rev/76dd082d332e

    ezio-melotti commented 11 years ago

    Fixed, thanks for the script!

    cjerdonek commented 11 years ago

    Thanks for committing, Ezio!