saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.17k stars 5.48k forks source link

x509.certificate_managed triggers a change when switching form python 2.7 to 3.6 #56556

Closed sjorge closed 3 years ago

sjorge commented 4 years ago

Description of Issue

While testing salt-3000.1 on py36 vs on py27 I noticed all my certificates are regenerated, including my CA! (this is absolutely not what I want, as the CA is also use for things outside of salt)

I restored backups of the 10 effected minions and double checked I was on py27 salt-3000.1 which I was, another global state.apply showed no changes, phew!

Switching to from python 2.7 to 3.6 for salt-3000.1 results in all x509.certificate_managed states to regenerate certificates

Debugging done

Debugging is hard as I could not leave everything in this broken state for long.

Using pudb i stepped through the x509 state and module:

Stripping this down to the value of just cert.get_subject():

The ordering seems to be different between both version of python (but consistant within the version itself)

Setup

Heavily stripped down state due to sensitive info

certificate.authority::directory:
  file.directory:
    - name: {{ certcfg['config']['authority_dir'] }}
    - makedirs: true
    - user: root
    - group: nacl
    - dir_mode: 2770

certificate.authority::private-key:
  x509.private_key_managed:
    - name: {{ ca_key_path }}
    - bits: 8192
    - backup: True
    - require:
      - file: certificate.authority::directory

certificate.authority::certificate:
  x509.certificate_managed:
    - name: {{ ca_crt_path }}
    - signing_private_key: {{ ca_key_path }}
    - CN: Example Root CA
    - O: Example
    - C: BE
    - ST: Antwerp
    - L: Kapellen
    - Email: certadm@example.org
    - basicConstraints: "critical CA:true"
    - keyUsage: "critical cRLSign, keyCertSign"
    - subjectKeyIdentifier: hash
    - authorityKeyIdentifier: keyid,issuer:always
    - days_valid: 3650
    - days_remaining: 0
    - backup: True
    - require:
      - x509: certificate.authority::private-key

Steps to Reproduce Issue

Versions Report

[root@cronos /data]# salt-call --versions-report
Salt Version:
           Salt: 3000.1

Dependency Versions:
           cffi: 1.11.5
       cherrypy: unknown
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.10
        libgit2: Not Installed
       M2Crypto: 0.30.1
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: 2.19
       pycrypto: 3.6.6
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.6.6 (default, Oct 23 2018, 17:13:47)
   python-gnupg: Not Installed
         PyYAML: 3.13
          PyZMQ: 17.1.2
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.2.5

System Versions:
           dist:
         locale: UTF-8
        machine: i86pc
        release: 5.11
         system: SunOS
        version: Not Installed
[root@cronos /data]# salt-call --versions-report
Salt Version:
           Salt: 3000.1

Dependency Versions:
           cffi: 1.11.5
       cherrypy: 17.4.0
       dateutil: 2.8.1
      docker-py: Not Installed
          gitdb: 2.0.6
      gitpython: 2.1.14
         Jinja2: 2.10
        libgit2: Not Installed
       M2Crypto: 0.30.1
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: 2.19
       pycrypto: 3.7.2
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.15 (default, Sep 10 2019, 16:35:50)
   python-gnupg: 0.4.5
         PyYAML: 3.13
          PyZMQ: 17.1.2
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.2.5

System Versions:
           dist:
         locale: UTF-8
        machine: i86pc
        release: 5.11
         system: SunOS
        version: Not Installed
sjorge commented 4 years ago

@Ch3LL ping

Ch3LL commented 4 years ago

was able to replicate this. This behavior also exhibited itself all the way back to 2018.3.4 so this has likely been broken since py3 was added to this state.

sjorge commented 4 years ago

I guess it was never an issue since people would either use a py2 or py3 environment. But I guess soon a lot of people will be switching over and more might hit this issue.

~ sjorge

On 6 Apr 2020, at 17:46, Megan Wilhite notifications@github.com wrote:

 was able to replicate this. This behavior also exhibited itself all the way back to 2018.3.4 so this has likely been broken since py3 was added to this state.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

sjorge commented 4 years ago

One way to fix it is to normalize cert.get_subject()'s data doing something like this:

[hyperon :: sjorge][~]
[■]$ python3
Python 3.7.7 (default, Mar 12 2020, 11:23:07)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> subj3 = '/C=BE/ST=Antwerp/L=Kapellen/O=Example/CN=Example Root CA/emailAddress=certadm@example.org'
>>> subj2 ='/C=BE/CN=Example Root CA/L=Kapellen/ST=Antwerp/O=Example/emailAddress=certadm@example.org'
>>> for s in [subj2, subj3]:
...     '/'.join(sorted(s.split('/')))
...
'/C=BE/CN=Example Root CA/L=Kapellen/O=Example/ST=Antwerp/emailAddress=certadm@example.org'
'/C=BE/CN=Example Root CA/L=Kapellen/O=Example/ST=Antwerp/emailAddress=certadm@example.org'
>>>

It will still function as intended but not trigger unwanted changes when switching between py2 <> py3.

Doing it this way if a value changes, or something gets added or removed... we still trigger a change. But if the fields/values are the same but in a different order, we will not change the certificate.

The issue seems to be that the certificates generate by m2crypto has different subject field orders in the actually PEM data between py2 and py3 with the exact same m2crypto version.

We always generate a new certificate in the state, check if they (minus a few fields) are the same. If yes, we discard the new certificate, if not we report the changes and update the certificate on disk. This probably also explains why this state is so slow. Especially when using remote signing we go back and forth to get it signed each time the state runs 😨

Although I am not immediately sure how we can achieve the same effect without a whole lot of code that will disect the current certificate and do a ton of checks on it. X509 is not my strong side.

sjorge commented 4 years ago

Unless someone stops me, I'll see if I can implement the normalization and see if it actually works in practice... but it will take me some time. If this is an unacceptable fix, let me know and I wont wast my time on this.

This is currently stopping me from switching over to py3 on SmartOS but other projects are keeping me busy too.

OrangeDog commented 4 years ago

@Ch3LL yes, it's always been broken. I think #52026 is the earliest report, and PRs to fix it have been waiting a while.

sjorge commented 4 years ago

Scratch my original idea, it seems returns something totally different on py2 than on py3 ???

sjorge commented 4 years ago

Some more debugging... https://github.com/saltstack/salt/blob/94c8ca527c051015c9ede6657cf0c854b95f2fab/salt/modules/x509.py#L1529 is where it goes wrong between py2 and py3... the order is different (but consistent).

At the very least we should sort those... the problem is that this does not fix the regenerate all certificates (and also the ca) in my case problem. But it should prevent it in the future.

OrangeDog commented 4 years ago

@sjorge I linked one of the duplicates above. There is already a fix but it has not been merged.

sjorge commented 4 years ago

Do you have a diff somewhere against master? Then I can test it locally.

sjorge commented 4 years ago

Closest I could find was https://github.com/saltstack/salt/pull/52456 this still applies on top of 3000.1 but doesn't fix the problem for me.

hatifnatt commented 4 years ago

Faced with this issue too, it's very painful because it's not possible to use Py 2.7 based packages on Deb 10, Salt only provide Py 3+ based packages for Deb 10. So I need to upgrade to Deb 10 because I need fresh CherryPy to get salt-api working but upgrading to Py 3 will cause reissue for all my certificates. Looks like a deadlock to me.

glynnforrest commented 4 years ago

Hi folks, I've been working on x509.certificate_managed recently (see #52935 and #56788). I'm planning to change the way the state does the certificate comparison to fix https://github.com/saltstack/salt/issues/52167, and will make sure to take this issue into account too. Very happy to work with others on this to get it working for all Python versions.

I'm mainly working with Debian 9 and Python 2.7 at the moment. I'll add Python 3.6 to my dev stack too!

sjorge commented 4 years ago

I've tracked the inconsistency down to cert.get_subject() which has the values in different order. But I have not found a fix.

This difference flows down int Subject Hash, and other fields that have subject mixed into it, but no clean way to fix it, it looks like the wrapped-c object returned is different internally.

s0undt3ch commented 4 years ago

Reopening since the PR only partially addressed the problem.

alxwr commented 4 years ago

I added a comment to https://github.com/saltstack/salt/pull/57688#issuecomment-678829263, containing a possible solution.

alxwr commented 4 years ago

I added a comment to #57688 (comment), containing a possible solution.

I turned that into a PR including tests.

sagetherage commented 4 years ago

PR has been merged, closing will be released in Magnesium

joni1993 commented 4 years ago

I still expierence this issue in 3002. Also it is not mentioned in the 3002 changelog - did it not make it into Magnesium?

sjorge commented 4 years ago

Can confirm, doesn’t look fixed

darkpixel commented 4 years ago

Ugh. I have around 130,500 certificates issued (around 900 per salt minion) at this point totaling about 550 MB of space used. About 6 months ago we removed the x509 state from the top file so we wouldn't keep issuing new certs every time we went highstate. This has been dragging on for ~3 years now.

s0undt3ch commented 4 years ago

The required changes to fix this do not belong in a .1 release. Re-targetting to Aluminum.

sagetherage commented 4 years ago

After reviewing this we won't be able make this change in the point release so moving it up in severity and to the next major release cycle Aluminium target date is mid-Feb 2021. Yes, copying what Pedro already put in the comments.

sjorge commented 4 years ago

Ouch, so that about for a year that most people migrating from 2.7 -> 3.x based python have to either: swallow resigning lots of certificates or not upgrade :(

OrangeDog commented 4 years ago

@sjorge well three years, if it's actually fixed by Aluminium.

OrangeDog commented 3 years ago

Apparently this was fixed in 3002.3 after all?