Closed insanecoderr closed 6 years ago
This command just retrieves logs for our software. It does not retrieve logs from the DIMMs
@juston-li So the log will not contain the err logs in AEPs?
No, just errors from our software
Thanks!
by the way, can ixpdimm-cli retrieve logs from DIMMs?
No, just events.
Thanks
@juston-li sorry to bother you again,but is there any API that i can use to get the err information of the dimm?
UEFI CLI provided by INTEL can show media or thermal errors in AEP, i think it will not be difficult for OS to do this
You can use "ndctl list -DH" to get a JSON listing of all the DIMMs and their health state. Here is example output from our unit test implementation:
[ { "dev":"nmem3", "id":"cdab-0a-07e0-feffffff", "handle":1, "phys_id":1, "health":{ "health_state":"non-critical", "temperature_celsius":23.0, "spares_percentage":75, "alarm_temperature":true, "alarm_spares":true, "temperature_threshold":80.125, "spares_threshold":128, "life_used_percentage":5, "shutdown_state":"clean" } }, { "dev":"nmem5", "id":"cdab-0a-07e0-fefeffff", "handle":257, "phys_id":3, "health":{ "health_state":"non-critical", "temperature_celsius":23.0, "spares_percentage":75, "alarm_temperature":true, "alarm_spares":true, "temperature_threshold":80.125, "spares_threshold":128, "life_used_percentage":5, "shutdown_state":"clean" } },
@juston-li can point to the equivalent ixpdimm-cli command.
@djbw In ixpdimm-cli it is ixpdimm-cli show -sensor...but the information is limited. i want to know the err log within the dimm..do you know any API to caputure that?
As far as Linux is concerned it only cares about media errors and SMART health.
Media errors can be listed with "ndctl list --regions --media-errors", and SMART health can be listed with "ndctl list --dimms --health". That should be all you need.
Can confirm there's currently no equivalent way to retrieve the error logs with ixpdimm-cli. As djbw mentioned media/smart errors should be enough.
For more detailed status/errors, you can run "ixpdimm-cli start -diagnostic Quick" Any errors should be captured as events and shown with "ixpdimm-cli show -event". I think you have to enable ixpdimm-monitor though to periodically check for errors, I'll double check on that.
@juston-li is there any guidebook of ixpdimm-monitor?
No not currently.
If you start/enable ixpdimm-monitor.service with systemctl, I'm told it should start logging thermal/smart events.
@juston-li Ummm.I see. when will ixpdimm-cli support the function of showing AEP error logs?
@insanecoderr hopefully never. media errors and smart health is the recommended interface.
@djbw @juston-li Okay,im doubted why not retrieve logs from AEP directly so that we can get the detailed error information,like error type(uncorrectable,fatal). Even more detailed:DDR-T link error,Uncorrectable on AIT Read,Uncorrectable on AIT Read and so on...
Then we can do some OS operation according to different error type
OS policy tooling needs to be nvdimm generic, it can't constantly chase vendor and model specific details. This is why ndctl merges the three health reporting mechanisms from Intel, Microsoft, and HPE into a common JSON record.
oh..thanks.i got it.
Could u pls list the corresponding DSM Command of "ixpdimm-cli show -log"?Thanks
@juston-li