passbolt / passbolt_api

Passbolt Community Edition (CE) API. The JSON API for the open source password manager for teams!
https://passbolt.com
GNU Affero General Public License v3.0
4.62k stars 304 forks source link

Healtcheck sometimes fails, on a retry it passes #439

Closed TheReptile closed 5 months ago

TheReptile commented 2 years ago

What you did

I created a cron job to extract the health check. For monitoring purposes. Basically this command: ./bin/cake passbolt healthcheck > /data/flusso/passbolt/output/passbolt_healthcheck.txt

What happened

Every now and then, there are errors in the output of the health-check. The errors only occur temporarily and when I retry, the errors are gone. These are the 2 errors shown:

 [FAIL] The private key cannot be used to decrypt and verify a message
 [FAIL] The public key cannot be used to verify a signature.

Our Passbolt installation is working fine, so I assume the health-check is sometimes wrong.

What you expected to happen

I would expect to the health-check to give consistent results.

stripthis commented 2 years ago

HI @TheReptile this checks rely on functionalities provided by php-gnupg. This could mean you have some issues with Gnupg on your system. It could come from either some clock issue (can you check the server time?) or entropy issue (on virtualized environment you can use haveged or rngtools).

TheReptile commented 2 years ago

@stripthis That's strange, on all the vms we use we have ntp and haveged installed.

# ps wauxxx | grep -e ntp -e haveged
root         396  0.0  0.2   8296  4772 ?        Ss   Jul15   0:16 /usr/sbin/haveged --Foreground --verbose=1 -w 1024
ntp          532  0.0  0.2  74632  4044 ?        Ssl  Jul15   0:37 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 110:115

Also this problem almost seems to be a race condition, once it fails, if I retry immediately, the test passes.

stripthis commented 2 years ago

Can you check the entropy pool size when it fails? Using /proc/sys/kernel/random/entropy_avail I think.

I'm not sure which issue this could be, but would be very grateful if you can help us narrow it down. Can you check if there are some additional information on the Gnupg side (https://www.gnupg.org/documentation/manuals/gpgme/Debugging.html)? Do you have any particular setup filesystem wise? Something that would prevent Gnupgp to read/write on the file system like concurent access or latency issues (network disk?).

Thank for your help

TheReptile commented 2 years ago

I managed to quickly reproduce this:

# echo `date +'%Y%m%d %H:%M:%S'`;/data/scripts/passbolt/passbolt_healthcheck.sh; grep FAIL passbolt_healthcheck.txt; echo -n "Entropy: "; cat  /proc/sys/kernel/random/entropy_avail 
20220719 15:29:39
 [FAIL] The private key cannot be used to decrypt and verify a message
 [FAIL] The public key cannot be used to verify a signature.
 [FAIL] 2 error(s) found. Hang in there!
Entropy: 2711
# echo `date +'%Y%m%d %H:%M:%S'`;/data/scripts/passbolt/passbolt_healthcheck.sh; grep FAIL passbolt_healthcheck.txt; echo -n "Entropy: "; cat  /proc/sys/kernel/random/entropy_avail 
20220719 15:29:42
Entropy: 2722

# echo `date +'%Y%m%d %H:%M:%S'`;/data/scripts/passbolt/passbolt_healthcheck.sh; grep FAIL passbolt_healthcheck.txt; echo -n "Entropy: ";cat /proc/sys/kernel/random/entropy_avail 
20220719 15:35:22
 [FAIL] The public key cannot be used to verify a signature.
 [FAIL] 1 error(s) found. Hang in there!
Entropy: 2916

This is a pretty default 20.04 VPS from Hetzner. It's using local storage.

stripthis commented 2 years ago

Can you try to set

GPGME_DEBUG=9:/home/user/mygpgme.log

And see if any information shows when the operation is failing?