Open pki-bot opened 4 years ago
Comment from mharmsen (@mharmsen) at 2016-08-08 20:51:33
NOTE: This ticket may be resolved by the 10.3.5 builds; needs to be retested with these before determination of the status of this ticket -- 10.3.6 or closed fixed.
Comment from edewata (@edewata) at 2016-08-11 18:55:54
The problem cannot be reproduced with PKI 10.3.5.
Comment from mbasti (@MartinBasti) at 2016-08-17 12:55:18
IT still does not work
pki-base-10.3.5-1.fc24.noarch
pki-base-java-10.3.5-1.fc24.noarch
pki-ca-10.3.5-1.fc24.noarch
pki-kra-10.3.5-1.fc24.noarch
Comment from mbasti (@MartinBasti) at 2016-08-17 12:55:49
attachment debug
Comment from mbasti (@MartinBasti) at 2016-08-17 12:57:05
attachment pki-kra-spawn.20160817075443.log
Comment from mbasti (@MartinBasti) at 2016-08-17 12:57:19
attachment ipaserver-kra-install.log
Comment from edewata (@edewata) at 2016-08-18 00:47:54
Martin,
Could you provide the exact commands to reproduce the problem? Please attach the input files too if any (e.g. PKCS 12 file). Thanks.
Comment from edewata (@edewata) at 2016-08-29 19:39:47
According to Martin so far this problem only happens in automated test. No actual user encountered the problem yet. Due to the rarity of the problem the priority is lowered. We still need a reproducer (including input files) to debug the problem and verify the fix later.
Comment from mbasti (@MartinBasti) at 2016-08-30 20:20:30
I haven't been able to find more minimal reproducer (maybe this is the minimal)
Steps to reproduce:
* [master] ipa-server-install --setup-dns
* [master] ipa-kra-install
* [replica0] ipa-replica-install (against master)
* [replica0] ipa-ca-install
* [replica0] ipa-kra-install
* [replica0] ipa-dns-install
* [replica1] ipa-replica-install --setup-ca (against master)
* [replica1] ipa-kra-install <-----failed here
I was able to reproduce it manually, but I don't know if this is 100% reproducible, please note that replica0 must be installed too, without that I couldn't reproduce it.
Comment from mbasti (@MartinBasti) at 2016-08-30 20:22:50
JFTR: this was reproduced with domain level 1 (default for IPA 4.3+)
Comment from mbasti (@MartinBasti) at 2016-08-30 20:38:52
I can confirm that this happen only when there are at least 3 servers with KRA, with 2 installs it works.
shorter reproducer
[master] ipa-server-install
[master] ipa-kra-install
[replica0] ipa-replica-install --setup-ca
[replica0] ipa-kra-install
[replica1] ipa-replica-install --setup-ca
[replica1] ipa-kra-install
Comment from edewata (@edewata) at 2016-08-31 07:34:19
I still cannot reproduce the problem with the above steps. Could you attach the CA and the KRA debug logs from all machines? Thanks.
Comment from mharmsen (@mharmsen) at 2016-09-12 22:35:15
Per CS/DS meeting of 09/12/2016: 10.4 (major)
Comment from mbasti (@MartinBasti) at 2016-09-16 15:36:34
Logs are too big, trac refuses to save them, I provided logs directly to Endi.
Comment from vakwetu (@vakwetu) at 2016-09-17 08:21:44
OK .. so whats going on here involves an authorization error due to replication timing. To explain the problem - and the possible solution - I need to explain a bit about how authorization works during the install process.
When you attempt to clone a Dogtag subsystem, the installer on the replica contacts the security domain CA, provides credentials and obtains a session_id which it uses as a token. At the same time, a database entry is created oin the security domain CA for the session (referenced by sessionID).
Now, during the install, whenever the replica needs something from another Dogtag subsystem, it provides this session_id to that system. That system then contacts the security domain and verifies that the session ID corresponds to an active installation session, and validates details like the user/system of the token bearer etc.
An example of this is as follows: When cloning a KRA, the KRA replica needs some configuration parameters from the master KRA. The replica provides the master KRA the session ID, and the master KRA validates the session ID by contacting the security domain (as configured on the master KRA).
OK -- so now lets understand what is going wrong.
Initially you have one PKI instance with a KRA and CA. (master CA/KRA). In the CS.cfg of each subsystem is a parameter securitydomain.host which points to the master instance.
Now, lets create the first replica CA. The replica CA contacts the security domain on master CA to get a token. When it asks the master CA for some config parameters, it provides the token - which the master CA checks against its own database. At the end of the install, the replica changes its securitydomain.host to point to itself.
Then we create a replica KRA. Once again, we contact the security domain on the master to get a token - and the KRA on the master checks its own db to see if the token is valid. Because its a KRA and not a CA though, the replica KRA still points to the master when the installation completes.
So, now we have
master CA (SD points to master CA)
master KRA (SD points to master CA)
replica1 CA (SD points to replica1 CA)
replica1 KRA (SD points to master CA)
Now we clone replica1 CA to create replica2 CA. In this case, replica2 contacts replica1 CA for the security domain, and replica1 verifies the token against its own database (as its SD points to replica1 CA). At the end of the install, the SD for replica2 is changed to replica2 CA.
Now, we try to clone replica1 KRA to create replica2 KRA. Replica2 KRA contacts the security domain on replica1 CA and gets a token issued by replica1 CA. When verifying the token, however, replica1 KRA checks his own security domain - which points to master CA.
Now normally this isn't a problem - because the databases for replica1 CA and master CA are replicated. As long as enough time has elapsed, the session record created on replica1 CA will have been replicated to master CA.
But occasionally, it seems that a validation is required before the session record is replicated - causing an authorization failure - as we see in this case.
So, how do we fix this?
The simple solution is to set the security domain of the KRA to the CA on the same host at the end of the install. So, if we do that --
replica1 CA (SD points to replica1 CA)
replica1 KRA (SD points to master CA)
becomes:
replica1 CA (SD points to replica1 CA)
replica1 KRA (SD points to replica1 KRA)
The end result is that the token is issued and verified from the same instance - and the same db instance. So we no longer need to worry about the vagaries of replication.
This of course takes advantage of the unique way IPA has set up dogtag - in that whenever there is a KRA, there is necessarily a CA too, and that all KRAs and CAs are clones. We can't assume this in general, which is why this fix needs to happen in IPA and not in dogtag.
A more general solution to this probably means revamping how we use tokens -- maybe using signed tokens for instance so that no validation is required. But this is slated for 10.4.
So, the take away is - to fix the problem:
Comment from mbasti (@MartinBasti) at 2016-09-19 14:25:54
Thanks for info:
Comment from mbasti (@MartinBasti) at 2017-02-27 14:08:39
Metadata Update from @MartinBasti:
This issue was migrated from Pagure Issue #2434. Originally filed by mbasti (@MartinBasti) on 2016-08-08 12:39:45:
Please see FreeIPA ticket: https://fedorahosted.org/freeipa/ticket/6096
This behavior is happening in our test automation. If you need additional info please contact me.