Closed Firefishy closed 2 months ago
I have ordered 2x replacement DIMMs. They should arrive in Catford shortly.
I don't want to jinx it, but it looks like the memory errors have stopped for now. Note to reader: Corrected ECC Errors, not Uncorrected ECC errors.
We scheduled a 1 hour maintenance today where I performed the following:
All options above were first tested on the twin snap-02.
We discussed the RAM replacement at the 11 July 2024 Ops call. We will aim to replace the memory in the server in the next 3 months. The server is no longer throwing errors and is not urgent priority.
In the event the RAM starts throwing errors we will treat it as urgent.
2x DIMMs are in-stock @ Catford.
Unfortunately not possible to tell what revision is insallled in snap-01. Stock is 2 different revisions.
I've been able to identify the FULL RAM model + revision: 36ASF4G72PZ-2G6E1QG
from photos.
Unfortunately neither of those I've ordered are an exact match.
Exact match: https://www.ebay.nl/itm/155164317853
I have ordered the exact memory module. It will arrive in Catford in a few days.
Matching memory module has arrived in Catford stock.
Memory ready and maintenance window scheduled for today: https://community.openstreetmap.org/t/openstreetmap-maintenance-26-september-2024/118989
Memory replaced.
Snap-01 has a failing DIMM throwing ECC Correction errors.
CPU_SrcID#1_MC#1_Chan#0_DIMM#0
Which is I think the one marked from this hardware linkage table:
DMI lists the memory as: