rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.29k stars 3.91k forks source link

TLS certificate with validity of decades results in an exception when querying `/api/health/checks/certificate-expiration/1/months` #12464

Open hvt opened 1 month ago

hvt commented 1 month ago

Describe the bug

We are using RabbitMQ 3.12.12.

Because of reasons (tm) we have a TLS certificate with an extremely long validity (and also signed by a CA that has that same validity period), namely:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            2d:b6:9b:eb:04:7d:ad:32:d8:ae:3b:4d:58:02:5c:af:fe:59:cb:3c
        Signature Algorithm: sha512WithRSAEncryption
        Issuer: CN = RabbitMQ Example CA
        Validity
            Not Before: Jan  1 00:00:00 1970 GMT
            Not After : Dec 31 23:59:59 2099 GMT
        Subject: CN = rabbitmq.example.org
        ...

When you now query the health check API for certificate expiration, you receive a HTTP 500 response, without any content. In the logs of RabbitMQ, this crash / traceback is printed:

2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>   crasher:
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     initial call: cowboy_stream_h:request_process/3
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     pid: <0.18355.14>
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     registered_name: []
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     exception error: an error occurred when evaluating an arithmetic expression
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in operator  div/2
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>          called as {error,"Certificate is not yet valid"} div 86400
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in call from calendar:gregorian_seconds_to_datetime/1 (calendar.erl, line 192)
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in call from rabbit_mgmt_wm_health_check_certificate_expiration:seconds_to_bin/1 (rabbit_mgmt_wm_health_check_certificate_expiration.erl, line 176)
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in call from rabbit_mgmt_wm_health_check_certificate_expiration:'-expires_on_list/1-lc$^0/1-0-'/1 (rabbit_mgmt_wm_health_check_certificate_expiration.erl, line 123)
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in call from rabbit_mgmt_wm_health_check_certificate_expiration:listener_expiring_within/2 (rabbit_mgmt_wm_health_check_certificate_expiration.erl, line 115)
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in call from rabbit_mgmt_wm_health_check_certificate_expiration:'-to_json/2-fun-1-'/3 (rabbit_mgmt_wm_health_check_certificate_expiration.erl, line 46)
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in call from lists:foldl_1/3 (lists.erl, line 1355)
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>       in call from rabbit_mgmt_wm_health_check_certificate_expiration:to_json/2 (rabbit_mgmt_wm_health_check_certificate_expiration.erl, line 45)
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     ancestors: [<0.18354.14>,<0.608.0>,<0.603.0>,<0.602.0>,<0.600.0>,
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>                   rabbit_web_dispatch_sup,<0.553.0>]
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     message_queue_len: 0
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     messages: []
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     links: [<0.18354.14>]
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     dictionary: []
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     trap_exit: false
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     status: running
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     heap_size: 6772
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     stack_size: 28
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>     reductions: 12142
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14>   neighbours:
2024-10-06 05:33:53.592394+00:00 [error] <0.18355.14> 
2024-10-06 05:33:53.594071+00:00 [error] <0.18354.14> Ranch listener {acceptor,{0,0,0,0,0,0,0,0},5673}, connection process <0.18354.14>, stream 1 had its request process <0.18355.14> exit with reason badarith and stacktrace [{erlang,'div',[{error,"Certificate is not yet valid"},86400],[{error_info,#{module => erl_erts_errors}}]},{calendar,gregorian_seconds_to_datetime,1,[{file,"calendar.erl"},{line,192}]},{rabbit_mgmt_wm_health_check_certificate_expiration,seconds_to_bin,1,[{file,"rabbit_mgmt_wm_health_check_certificate_expiration.erl"},{line,176}]},{rabbit_mgmt_wm_health_check_certificate_expiration,'-expires_on_list/1-lc$^0/1-0-',1,[{file,"rabbit_mgmt_wm_health_check_certificate_expiration.erl"},{line,123}]},{rabbit_mgmt_wm_health_check_certificate_expiration,listener_expiring_within,2,[{file,"rabbit_mgmt_wm_health_check_certificate_expiration.erl"},{line,115}]},{rabbit_mgmt_wm_health_check_certificate_expiration,'-to_json/2-fun-1-',3,[{file,"rabbit_mgmt_wm_health_check_certificate_expiration.erl"},{line,46}]},{lists,foldl_1,3,[{file,"lists.erl"},{line,1355}]},{rabbit_mgmt_wm_health_check_certificate_expiration,to_json,2,[{file,"rabbit_mgmt_wm_health_check_certificate_expiration.erl"},{line,45}]}]
2024-10-06 05:33:53.594071+00:00 [error] <0.18354.14> 

Reproduction steps

I am not entirely sure if this is caused by the CA validity or the certificate validity. I have however generated an example CA certificate and an example certificate + key:

  1. I am referencing these three files in rabbitmq.conf like this:
    ssl_options.cacertfile = ca.crt
    ssl_options.certfile   = server.crt
    ssl_options.keyfile    = server.key
  2. When you now try querying the API health check for certificate expirations, you get:
    $ curl --verbose --user admin:password http://localhost:5673/api/health/checks/certificate-expiration/1/months
    *   Trying 127.0.0.1:5673...
    * Connected to localhost (127.0.0.1) port 5673 (#0)
    * Server auth using Basic with user 'admin'
    > GET /api/health/checks/certificate-expiration/1/months HTTP/1.1
    > Host: localhost:5673
    > Authorization: Basic ...
    > User-Agent: curl/7.81.0
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 500 Internal Server Error
    < content-length: 0
    < 
    * Connection #0 to host localhost left intact

Expected behavior

Not triggering a HTTP 500 and not listing the certificate as being about to expire.

Additional context

No response

michaelklishin commented 1 month ago

3.12 has been out of support for more than 6 months.

So we will be trying to reproduce this against 4.0.2, the only community-supported series.

The standard expiration period used by tls-gen is 10 years. In theory overriding it to be 50 years or something should be enough.

lukebakken commented 1 month ago

Here is where the Certificate is not yet valid error message is generated:

https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbitmq_management/src/rabbit_mgmt_wm_health_check_certificate_expiration.erl#L137-L161

@hvt you've probably found an edge case that the code misses. I'll investigate this when I can find time.

What happens if your certs have a slightly later start time, like epoch time plus 1 second?

hvt commented 1 month ago

What happens if your certs have a slightly later start time, like epoch time plus 1 second?

At first I thought it was a division by zero. So I already tried creating a certificate (and CA) with a Not Before of Jan 1 00:00:01 1970 GMT. That failed as well, with the same error.

michaelklishin commented 1 month ago

The code computes the difference between dates in seconds, so something may not be accounting for overflow/wrap around in one of the calendar modules.

In some if not all cases we could use minutes or hours. This health check is meant to be run e.g. every day, not every hour or minute.