python / cpython

The Python programming language
https://www.python.org
Other
63.04k stars 30.19k forks source link

Add SHA-3 and SHAKE (Keccak) support #60317

Closed tiran closed 8 years ago

tiran commented 12 years ago
BPO 16113
Nosy @tim-one, @loewis, @rhettinger, @gpshead, @jcea, @pitrou, @vstinner, @larryhastings, @tiran, @ezio-melotti, @asvetlov, @mgorny, @dstufft
Files
  • 521e85a613bf.diff
  • remove_sha3.patch
  • SHA3-and-SHAKE-support-for-Python.patch
  • SHA3-and-SHAKE-support-for-Python-2.patch
  • SHA3-and-SHAKE-support-for-Python-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['extension-modules', 'type-feature'] title = 'Add SHA-3 and SHAKE (Keccak) support' updated_at = user = 'https://github.com/tiran' ``` bugs.python.org fields: ```python activity = actor = 'mgorny' assignee = 'none' closed = True closed_date = closer = 'christian.heimes' components = ['Extension Modules'] creation = creator = 'christian.heimes' dependencies = [] files = ['27441', '33298', '42764', '43107', '44176'] hgrepos = [] issue_num = 16113 keywords = ['patch'] message_count = 80.0 messages = ['171848', '171882', '171898', '171913', '171929', '171963', '171964', '171968', '171971', '171983', '171995', '172070', '172100', '172144', '172152', '172157', '172158', '172313', '172314', '172316', '172319', '172324', '183129', '190303', '191931', '191940', '191971', '201078', '201079', '201080', '201081', '201082', '201083', '201084', '201085', '201086', '201092', '201096', '201928', '207169', '207170', '207171', '207184', '207187', '207188', '207189', '207190', '207191', '207192', '207225', '207226', '207228', '207229', '231838', '253023', '253025', '253028', '253029', '253174', '264029', '265033', '265059', '265066', '265088', '265125', '266911', '266974', '273252', '273328', '273330', '273331', '273363', '274786', '274789', '274790', '274791', '274793', '274797', '275000', '288706'] nosy_count = 21.0 nosy_names = ['tim.peters', 'loewis', 'rhettinger', 'gregory.p.smith', 'jcea', 'pitrou', 'vstinner', 'larry', 'christian.heimes', 'habnabit', 'ezio.melotti', 'spatz', 'Arfrever', 'asvetlov', 'mgorny', 'python-dev', 'sbt', 'bjornedstrom', 'dstufft', 'markk', 'haakon'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue16113' versions = ['Python 3.6'] ```

    tiran commented 12 years ago

    Today the latest crypto hash function was announced by NIST [1]. I suggest that we include the new hash algorithm in 3.4 once it lands in OpenSSL.

    The Keccak site also has a reference implementation in C and Assembler [2]. It may take some effort to integrate the reference implementation as it contains several optimized backends for X86, X86_64, SIMD and various ARM platforms.

    [1] http://www.nist.gov/itl/csd/sha-100212.cfm [2] http://keccak.noekeon.org/

    jcea commented 12 years ago

    We have MD5, SHA1, sha256, sha512 implemented, to use when openssl is not available. Can we do the same with sha-3?. I would suggest to adopt the reference implementation without extensive optimizations, since we will have them when openssl has them.

    So we might implement SHA-3 now and integrate OpenSSL implementation later, when available. This is interesting, for instance, because many users of Python 3.4 will have a non "up to date" OpenSSL system library.

    tiran commented 12 years ago

    I've done some experiments with the reference implementation and adopted code of sha1module.c for sha3: https://bitbucket.org/tiran/pykeccak

    So far the code just compiles (64bit only) but doesn't work properly yet. I may need to move away from the NIST interface and use the sponge interface directly.

    d123ebd5-5978-4f5d-9f26-29b47ad88bcf commented 12 years ago

    For what it's worth, I've built a working C-based sha3-module that is available here: https://github.com/bjornedstrom/python-sha3

    Note that I've only tested this on Python 2, for Python 3 YMMV.

    Best regards Björn

    tiran commented 12 years ago

    Hello Björn,

    thanks for the information. Your package didn't turn up on Google when I started with my experiment. Perhaps it's too new?

    Your code and mine have lots of similarities. I was amused when I saw that you had the same issue with the block size attribute. At first I set it to 200 (1600 / 8) but eventually I didn't implement it.

    My code does everything in C with a separate constructor for each flavor of SHA-3. It's compatible to Python 2.6 to 3.4 and uses the optimized code for 32 and 64bit platforms.

    Oh, and my code is now working properly. Feel free to review the module. I'll upload the test code later.

    tiran commented 12 years ago

    Release 0.1 of pysha3 [1] is out. I've tweaked the C module to make it compatible with Python 2.6 to 3.4. The module and its tests run successfully under Linux and Windows. So far I've tested Linux X84_64 (2.7, 3.2, 3.3, 3.4), Windows X86 (2.6, 2.7, 3.2, 3.3) and Windows X86_64 (2.6, 2.7, 3.2, 3.3).

    Please review Modules/sha3module.c and ignore all version specific #if blocks. For Python 3.4 I'm going to remove all blocks for Python \< 3.3.

    [1] http://pypi.python.org/pypi/pysha3/0.1

    pitrou commented 12 years ago

    Please review Modules/sha3module.c

    Can't you post a patch here?

    tiran commented 12 years ago

    How about a sandbox repos?

    pitrou commented 12 years ago

    Good, you can click the "create patch" button when it's ready :)

    tiran commented 12 years ago

    Antoine pointed out that the code contains C++ comments and exports a lot of functions. The latest patch has all // comments replaced, marks all functions and globals as static and #includes the C files directly.

    tiran commented 12 years ago

    Please review the latest patch.

    I've included Gregory as he is the creator of hashlib.

    tiran commented 12 years ago

    The hightlights of the next patch are

    tiran commented 12 years ago

    I've documented the optimization options of Keccak. The block also contains a summarization of my modifications of the reference code.

    http://hg.python.org/sandbox/cheimes/file/57948df78dbd/Modules/_sha3/sha3module.c#l22

    tiran commented 12 years ago

    New patch. I've removed the dependency on uint64 types. On platforms without a uint64 type the module is using the 32bit implementation with interleave tables.

    By the way the SSE / SIMD instructions aren't useful. They are two to four times slower.

    gpshead commented 12 years ago

    don't worry about optimization settings in python itself for now. the canonical optimized version will be in a future openssl version. now that it has been declared the standard it will get a *lot* more attention in the next few years.

    as it is, we _may_ want to replace this reference implementation with one from libtomcrypt in the future when it gets around to implementing it just so that the code for all of our bundled hash functions comes from the same place.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 12 years ago

    New changeset 11c9a894680e by Christian Heimes in branch 'default': Issue bpo-16113: integrade SHA-3 (Keccak) patch from http://hg.python.org/sandbox/cheimes http://hg.python.org/cpython/rev/11c9a894680e

    tiran commented 12 years ago

    The code has landed in default. Let's see how the build bots like my patch and the reference implementation.

    e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

    _sha3 is not being built on Windows, so importing hashlib fails

    >>> import hashlib
    ERROR:root:code for hash sha3_224 was not found.
    Traceback (most recent call last):
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 109, in __get_openssl_constructor
        f = getattr(_hashlib, 'openssl_' + name)
    AttributeError: 'module' object has no attribute 'openssl_sha3_224'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 154, in <module>
        globals()[__func_name] = __get_hash(__func_name)
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 116, in __get_openssl_constructor
        return __get_builtin_constructor(name)
      File "C:\Repos\cpython-dirty\lib\hashlib.py", line 104, in __get_builtin_constructor
        raise ValueError('unsupported hash type ' + name)
    ValueError: unsupported hash type sha3_224
    ...
    tiran commented 12 years ago

    I've pushed a fix about 5 minutes ago. The module wasn't compiled in debug builds due to an error in the project file. Please update your copy and try again.

    tiran commented 12 years ago

    6cf6b8265e57 and 8172cc8bfa6d have fixed the issue on my VM. I didn't noticed the issue as I only tested hashlib with the release builds, not the debug builds. Sorry for that.

    e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

    6cf6b8265e57 and 8172cc8bfa6d have fixed the issue on my VM. I didn't noticed the issue as I only tested hashlib with the release builds, not the debug builds. Sorry for that.

    Ah. I did not even notice there was _sha3.vcxproj.

    Is there any particular reason for not making it part of python3.dll like _sha1, _sha256, _sha512 are? (I thought it was only modules with special link requirements that became separate DLLs.)

    tiran commented 12 years ago

    The module is rather large (about 190 KB) because the optimized SHA-3 implementation isn't optimized for size. For this reason I like to keep the module out of the main binary for now.

    b623f1a5-90b6-4181-92e7-2edacf44327f commented 11 years ago

    Please do not go forward until NIST publishes its SHA-3 specification document. We don't know yet what parameters they will finally choose when making Keccak SHA-3.

    b623f1a5-90b6-4181-92e7-2edacf44327f commented 11 years ago

    NIST has published a tentative schedule for SHA-3 standardization. They expect to publish in the second quarter of 2014.

    See http://csrc.nist.gov/groups/ST/hash/sha-3/timeline_fips.html

    and http://csrc.nist.gov/groups/ST/hash/sha-3/sha-3_standardization.html

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 11 years ago

    As long as the reference Keccak code is going to live in the python stdlib anyway, I would /greatly/ appreciate it if the Keccak sponge function was directly exposed instead of just the fixed parameters used for SHA-3.

    A Keccak sponge can have a much wider range of rates/capacities, and after absorption can have any number of bytes squeezed out. The ability to get an unbounded number of bytes out is very useful and I've written some code that uses that behavior. I ended up having to write my own Keccak python library since none of the other SHA-3 libraries exposed this either.

    tiran commented 11 years ago

    Hi Aaron,

    it's a tempting idea but I have to decline. The API is deliberately limited to the NIST interface. Once OpenSSL gains SHA-3 support we are going to use it in favor for the reference implementation. I don't expect OpenSSL to provide the full sponge API.

    I also like to keep all options open so I can switch to a different and perhaps smaller implementation in the future. The reference implementation is huge and the binary is more than 400 KB. For comparison the SHA-2 384 + 512 module's binary is just about 60 KB on a 64bit Linux system.

    Once a a new API has been introduced it's going to take at least two minor Python release and about four to five years to remove it.

    But I could add a more flexible interface to Keccak's sponge to my standalone sha3 module https://pypi.python.org/pypi/pysha3 ...

    0ba59aa5-2e90-422d-94d1-15d2602a5498 commented 11 years ago

    https://pypi.python.org/pypi/cykeccak/ is what I've written to do this, for reference.

    Honestly I hope that the Keccak sponge is directly exposed in openssl (or any other SHA-3 implementation) because of its utility beyond SHA-3. If the source of some other implementation is going to be bundled with python anyway, it shouldn't be difficult to expose the sponge bits.

    17d7e64a-0832-484c-a7c2-3ebb0c76eff6 commented 10 years ago

    Please make sure that the currently committed code is not released as part of Python 3.4. SHA-3 is not standardised yet, and NIST has said that they intend to make some changes to the Keccak SHA-3 submission before standardisation as a FIPS.

    The links englabenny posted have a good overview of the SHA-3 timeline and the proposed changes.

    It would be very confusing if hashlib in Python 3.4 came with a "sha3" that was incompatible with the final standard.

    larryhastings commented 10 years ago

    Victor: a "new feature" is not a "release blocker".

    tiran commented 10 years ago

    I'm tracking the SHA-3 progress closely. I'm prepared to pull the plug if there is any doubt about the final version of SHA-3 before beta 2 is released on Jan 5th.

    Larry: I have marked this new feature as release blocker because I may have to remove it and reschedule its addition for 3.5. I'd like to remove it after you have branched off the 3.4 branch.

    tiran commented 10 years ago

    Larry: I have marked this new feature as release blocker because I may have to remove it and reschedule its addition for 3.5. I'd like to remove it after you have branched off the 3.4 branch.

    larryhastings commented 10 years ago

    "release blocker" means "the release cannot go out until this issue is solved". Adding SHA-3, while nice, is simply not something I am going to hold up 3.4 for, full stop.

    Please stop marking this issue as a "release blocker".

    80036ac5-bb84-4d39-8416-02cd8e51707d commented 10 years ago

    Here "release blocker" would mean that if SHA-3 specification is not finished, then "the release cannot go out until SHA-3 is deleted".

    larryhastings commented 10 years ago

    You guys are making me cranky. Please stop adding me to this issue.

    tim-one commented 10 years ago

    @Larry, you seem to be misreading this. They're not saying 3.4 can't be released until this feature is added. It's _already been added. They're saying 3.4 possibly can't be released until this feature is _removed - but whether it needs to be removed is outside of our control, and is not yet known.

    "release blocker" means "the release cannot go out until this issue is solved"

    Yes - and this issue has not been solved yet. It should indeed be solved before 3.4 is released, so "release blocker" is spot on.

    larryhastings commented 10 years ago

    *sigh* fine. But the title of the issue is no longer accurate.

    And, Christian, I generate the 3.4 maintenance branch during the release process, not before. So if you have to remove sha3 you're going to have to remove it from trunk.

    pitrou commented 10 years ago

    I'm prepared to pull the plug if there is any doubt about the final version of SHA-3 before beta 2 is released on Jan 5th.

    Shouldn't it be removed before beta1? The usual rule of feature freeze applies here.

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 10 years ago

    This strikes me as a rather unusual case. How about discuss it on python-dev, come to an agreement and document the process for this type of issue somewhere for future reference? Or is that simply OTT?

    tiran commented 10 years ago

    New information on NIST's hash forum strongly suggest that NIST is going to standardize SHA-3 according to the original proposal with c=2n and Sakura padding as well as two SHAKEs with variable length output.

    SHA3-224 with c=448 SHA3-256 with c=512 SHA3-384 with c=768 SHA3-512 with c=1024

    SHAKE128 with c=256 SHAKE256 with c=512

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 10 years ago

    I'm going to remove sha3 from the trunk tomorrow unless I hear otherwise. Python shouldn't implement something called "sha3" until SHA-3 actually is a standard. According to the current NIST timeline, the comment period on the draft FIPS should have ended by now, but AFAICT, the draft FIPS that starts the 90 day comment period hasn't even been published yet.

    vstinner commented 10 years ago

    Will it be possible/easy to maintain a sha3 module on PyPI? It would be nice to have to for Python 2.6-3.4.

    @Christian: Are you interested to do that?

    pitrou commented 10 years ago

    Either that, or we call it something else than "sha3"?

    gpshead commented 10 years ago

    I would not bother pulling this out until the week before RC1 if the standard has not yet been declared final.

    Otherwise, -1 on keeping it under another name. The only hashes we bundle should be standard ones as those are the only ones people will want to use in the long run. We'd be saddled with carrying along a non-standard likely not widely used algorithm implementation forever otherwise.

    even if sha3 isn't declared before 3.4rc1, people building 3.4 against a sufficiently modern version of openssl that includes sha3 (as i'm sure some version will) will still have access to the algorithm.

    otherwise i'm sure someone will package this as a module on pypi for older pythons regardless.

    tiran commented 10 years ago

    I have created a backport of the sha3 for Python 2.6 to 3.3 about an year ago. It's on PyPI: https://pypi.python.org/pypi/pysha3 . I'm planing to update the code with SHAKE256 and SHAKE512 support soonish, too.

    I have very high confidence that NIST is neither going to change the parameters or padding for SHA3 nor is NIST going to deviate from the original Keccak proposal. In case you still prefer to remove SHA3 I suggest that we stick to GPS' plan and wait until RC1.

    The attached patch removes all code and documentation for SHA3.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 10 years ago

    Ok, this this remains a release blocker. I'm still +1 for removing it, and I'm -0 for removing it just before the release candidate. AFAICT, there is *zero* (.000000001) chance that it actually becomes a NIST standard before the Python release is made. According to the current timeline:

    http://csrc.nist.gov/groups/ST/hash/sha-3/timeline_fips.html

    the *submission to the secretary* (of commerce) was scheduled for Q2. With the current delay, this must become Q3, so the publication as a standard might happen in Q4 (not sure how long the Secretary of Commerce needs to study the specification of a hash algorithm).

    What might happen is that a draft is published by the time the RC is made. I'd then still be -1 on including something in Python that only implements a draft standard. So we could just as well remove it right away.

    pitrou commented 10 years ago

    I agree with Martin that it should be removed right now. It's not really reasonable to call something SHA-3 if it's not SHA-3, even in beta versions.

    vstinner commented 10 years ago

    OpenSSL doesn't implement SHA-3 yet, it's strange to have SHA-3 in Python but not in OpenSSL. If the standard is still a draft, I agree to remove the code right now.

    gpshead commented 10 years ago

    Given the likely delay in the standard Martin cites, I've change my mind: agreed. Go ahead and remove it for 3.4.

    We'll have an official sha3 in Python 3.5. Early adopters can live with PyPI.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 10 years ago

    I just looked at the hash-forum archives (*)

    http://cio.nist.gov/esd/emaildir/lists/hash-forum/msg02809.html

    which says that they plan to publish the draft "soon after Christmas". They also indicate how the padding open issue might get resolved (append 1111 for SHAKE, 1101 for the SHA-2 drop-ins). Not sure whether this is what Christian has already implemented.

    (*) See http://crypto.stackexchange.com/questions/10645/are-nists-changes-to-keccak-sha-3-problematic for the password

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

    New changeset 52350d325b41 by Martin v. Löwis in branch 'default':