openstreetmap / operations

OSMF Operations Working Group issue tracking
https://operations.osmfoundation.org/
99 stars 12 forks source link

Failed PSU in karm #944

Closed tomhughes closed 1 year ago

tomhughes commented 1 year ago

There is a PSU failure being reported by karm - from ipmitool-sensors:

4627 | PS1 Status      | Power Supply      | N/A        | N/A   | 'Presence detected'
4694 | PS2 Status      | Power Supply      | N/A        | N/A   | 'Presence detected' 'Power Supply Failure detected'
tomhughes commented 1 year ago

PSUs are 1600W Supermicro PWS-1K62A-1R

pnorman commented 1 year ago

The failure happened when the power briefly went out, so the supply was probably on the edge of failing for some time. A reboot didn't clear the failure.

Plan is to replace with 2x 1kW supermicro supplies for better efficiency, it should be worth it over the remainder of the lifetime of the machine.

Firefishy commented 1 year ago

2x 1000W replacement PSUs ordered.

Firefishy commented 1 year ago

Replacement PSUs have arrived in Catford.

pnorman commented 1 year ago

Power supplies should be mailed to be there on Nov 8th. The 9th is the planned date.

Firefishy commented 1 year ago

@pnorman Please can you book a site visit via the Equinix portal. I will then link the inbound to your visit ticket.

Firefishy commented 1 year ago

PSU have been posted. The server should be powered down when installing the new PSUs. Both PSUs should be replaced at the same time.

pnorman commented 1 year ago

Replaced. Sending broken PSU to ewaste. Good PSU on shelf.