rmbolger / Posh-ACME

PowerShell module and ACME client to create certificates from Let's Encrypt (or other ACME CA)
https://poshac.me/docs/latest/
MIT License
778 stars 190 forks source link

Submit-Renewal -AllOrders fails due to anti-replay nonce #152

Closed thetrav closed 5 years ago

thetrav commented 5 years ago

I have a powershell script executing as the system user, the first thing it does it execute: Submit-Renewal -AllOrders

That call fails with: PS>TerminatingError(Invoke-WebRequest): "{ "type": "urn:ietf:params:acme:error:badNonce", "detail": "JWS has an invalid anti-replay nonce: \"PFgCraWDcDIiAOjupLPrvQx_y-OF5Ch-Q-mkGE4aJxw\"", "status": 400 }"

The server only has one certificate which was originally provisioned by the system user using: New-PACertificate $certHost -DnsPlugin Azure -PluginArgs $azParams -AcceptTOS -Contact $adminEmail -Verbose

rmbolger commented 5 years ago

Does this happen reliably every time? Or was it a temporary thing?

The occasional stale nonce value can cause this error, but the internal Invoke-ACME function has code to retry the request once with a fresh nonce. So whatever call was running would have had to fail twice in a row to terminate the call. It’s not unheard of if you’re behind a NAT with a lot of active Let’s Encrypt users, but generally pretty rare. So if it’s reliably happening every time, there might be something else going on.

If that’s the case, can you enable debug logging with $DebugPreference = ‘Continue’ and then rerun the command with the-Verbose` flag and post the output here (sanitized if you wish)?

Also, what version of PowerShell are you running?

thetrav commented 5 years ago

It's reliably happening for me, I have yet to successfully renew a certificate using that method. I don't have a count though it happened enough times in a row for me to get rate limited.

rmbolger commented 5 years ago

Do you recall which rate limit was tripped? It’s going to be hard to troubleshoot this without a debug/verbose enabled log. Can you switch to the staging server and reproduce the problem? It should be as simple as Set-PAServer LE_STAGE and then re-running the original New-PACertificate command.

thetrav commented 5 years ago

Ok, I ran it with the extras as you said (output below).
The last line gives me more of a clue as to what's going on, inspecting the order json at: C:\windows\System32\config\systemprofile\AppData\Local\Posh-ACME\acme-v02.api.letsencrypt.org\58351708\$certHost.json I note the key: "RenewAfter": "2019-07-31T23:52:40Z"

Which means It won't renew until August, the JWS message was a red herring

**********************
Windows PowerShell transcript start
Start time: 20190619161327
Username: GROWDATA\SYSTEM
RunAs User: GROWDATA\SYSTEM
Configuration Name: 
Machine: test (Microsoft Windows NT 10.0.17763.0)
Host Application: C:\windows\System32\WindowsPowerShell\v1.0\powershell.EXE -NonInteractive -Command .\renew_certs.ps1 | Out-File -FilePath C:\Growdata\renew_certs.log -A
ppend -Encoding ASCII
Process ID: 10124
PSVersion: 5.1.17763.503
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.17763.503
BuildVersion: 10.0.17763.503
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
VERBOSE: POST https://acme-v02.api.letsencrypt.org/acme/order/58351708/495700325 with -1-byte payload
>> TerminatingError(Invoke-WebRequest): "{
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has an invalid anti-replay nonce: \"PFgCraWDcDIiAOjupLPrvQx_y-OF5Ch-Q-mkGE4aJxw\"",
  "status": 400
}"
VERBOSE: POST https://acme-v02.api.letsencrypt.org/acme/order/58351708/495700325 with -1-byte payload
VERBOSE: received 365-byte response of content type application/json
VERBOSE: No renewable orders found for account 58351708.
**********************
Windows PowerShell transcript end
End time: 20190619161331
**********************
rmbolger commented 5 years ago

Oh, that’s interesting. I didn’t realize transcripts displayed Exceptions I’m catching and not re-throwing. It’s definitely possible I’m doing something wrong there though and it shouldn’t be showing in the transcript.

In any case, yes. Submit-Renewal will skip any orders that haven’t reached the standard renewal window yet (30’ish day from expiration) unless you specify the -Force parameter.

If you’ve got a custom script that is doing post-processing tasks after a renewal, the best thing to do is assign the output of the renew command to a variable. The variable will contain the cert details of anything that was actually renewed (same object output as Get-PACertificate). If it’s empty and there were no errors, nothing needed renewing and you can skip any additional post-processing. In your case with only one cert, you probably don’t need to use -AllOrders unless your planning to add more later. But if you do, the logic might look something like this:

$renewedCerts = @(Submit-Renewal -AllOrders -Verbose)
foreach ($cert in $renewedCerts) {
    # do stuff with new cert details
}
thetrav commented 5 years ago

Ok, thanks for the help, I do indeed have quite a bit of junk that I need to do when the certificate actually gets renewed (add it to the computer's cert store and configure about 5 different services to use it).

I guess you can close the issue, if I have issues in a months time when renewal is due I'll pipe up again

MetUys commented 4 years ago

Hi @rmbolger ,

I know you have closed this, however I too am getting the "invalid anti-replay nonce" response. I am using Transcript and that's is when I see the error. It doesn't actually stop the certificate generating your service does the retry on its own. So I feel you might be right on the Transcript peering into your inner-workings.

I only noticed this when setting up a brand new server using the latest version (as of now v3.15.1) It doesn't just happen on renewals but also brand new certificates (reproducible each and every time)

Is it possible to check back into this and at worst suppress this alert? My concern is its doing something odd, which it recovers from, that adds a slight delay to the process + extra processing not needed + the panic of a log watcher when they see that (haha).

PS: Sadly I went and updated all my deployments to the latest version before noticing this, so cant really confirm at which version this started happening on.

rmbolger commented 4 years ago

Unfortunately, I don't think there's anything I can do about it. It seems to be a "feature" of the transcript functionality that it captures and outputs terminating errors even if they're properly caught and dealt with in the calling code using try/catch. In other words, there's nothing in my code that is writing that information to the output stream. Here's a reddit post with someone complaining about the same thing.

https://www.reddit.com/r/PowerShell/comments/ctvl14/how_do_i_stop_try_statement_writing_the_error_to/

Bad nonce errors are actually a normal part of the ACME protocol and nothing to be concerned about though. The ACME RFC even provides guidance that clients should expect to get them and retry the request with the fresh nonce provided in the error response.

https://tools.ietf.org/html/rfc8555#section-6.5

thetrav commented 4 years ago

I don't want to be a jerk here, your library is excellent and I benefit from you having made it, so you have my gratitude already.

I'm a bit of a lazy slob though and haven't really read up on the ACME protocol, I just know I can point things at your library and get certificates (hooray!)

If it's possible to detect that the terminating error occurred and therefore know that it has (or will) output some concerning logging, maybe it's possible to output some additional explanatory logging to help slow pokes like me from getting my knickers in a knot.

As it stands you've already done heaps though, so thanks already :D

rmbolger commented 4 years ago

Ironically, it already does log that it's retrying the request. But it's currently output using Write-Debug which would only show in the transcript if you had set $DebugPreference='Continue' before running the Posh-ACME command. Your transcript would also be full of a ton of other low level debug messaging which you likely don't care about.

I wouldn't be opposed to changing it to Write-Verbose instead. But you'd have to make sure to include -Verbose on your various Posh-ACME commands for it to show up in the transcript.