intel / ixpdimm_sw

BSD 3-Clause "New" or "Revised" License
29 stars 13 forks source link

What is the corresponding DSM Command of "ixpdimm-cli show -log"? #13

Closed insanecoderr closed 6 years ago

insanecoderr commented 6 years ago

Could u pls list the corresponding DSM Command of "ixpdimm-cli show -log"?Thanks

@juston-li

juston-li commented 6 years ago

This command just retrieves logs for our software. It does not retrieve logs from the DIMMs

insanecoderr commented 6 years ago

@juston-li So the log will not contain the err logs in AEPs?

juston-li commented 6 years ago

No, just errors from our software

insanecoderr commented 6 years ago

Thanks!

insanecoderr commented 6 years ago

by the way, can ixpdimm-cli retrieve logs from DIMMs?

juston-li commented 6 years ago

No, just events.

insanecoderr commented 6 years ago

Thanks

insanecoderr commented 6 years ago

@juston-li sorry to bother you again,but is there any API that i can use to get the err information of the dimm?

UEFI CLI provided by INTEL can show media or thermal errors in AEP, i think it will not be difficult for OS to do this

djbw commented 6 years ago

You can use "ndctl list -DH" to get a JSON listing of all the DIMMs and their health state. Here is example output from our unit test implementation:

[ { "dev":"nmem3", "id":"cdab-0a-07e0-feffffff", "handle":1, "phys_id":1, "health":{ "health_state":"non-critical", "temperature_celsius":23.0, "spares_percentage":75, "alarm_temperature":true, "alarm_spares":true, "temperature_threshold":80.125, "spares_threshold":128, "life_used_percentage":5, "shutdown_state":"clean" } }, { "dev":"nmem5", "id":"cdab-0a-07e0-fefeffff", "handle":257, "phys_id":3, "health":{ "health_state":"non-critical", "temperature_celsius":23.0, "spares_percentage":75, "alarm_temperature":true, "alarm_spares":true, "temperature_threshold":80.125, "spares_threshold":128, "life_used_percentage":5, "shutdown_state":"clean" } },

@juston-li can point to the equivalent ixpdimm-cli command.

insanecoderr commented 6 years ago

@djbw In ixpdimm-cli it is ixpdimm-cli show -sensor...but the information is limited. i want to know the err log within the dimm..do you know any API to caputure that?

djbw commented 6 years ago

As far as Linux is concerned it only cares about media errors and SMART health.

Media errors can be listed with "ndctl list --regions --media-errors", and SMART health can be listed with "ndctl list --dimms --health". That should be all you need.

juston-li commented 6 years ago

Can confirm there's currently no equivalent way to retrieve the error logs with ixpdimm-cli. As djbw mentioned media/smart errors should be enough.

For more detailed status/errors, you can run "ixpdimm-cli start -diagnostic Quick" Any errors should be captured as events and shown with "ixpdimm-cli show -event". I think you have to enable ixpdimm-monitor though to periodically check for errors, I'll double check on that.

insanecoderr commented 6 years ago

@juston-li is there any guidebook of ixpdimm-monitor?

juston-li commented 6 years ago

No not currently.

If you start/enable ixpdimm-monitor.service with systemctl, I'm told it should start logging thermal/smart events.

insanecoderr commented 6 years ago

@juston-li Ummm.I see. when will ixpdimm-cli support the function of showing AEP error logs?

djbw commented 6 years ago

@insanecoderr hopefully never. media errors and smart health is the recommended interface.

insanecoderr commented 6 years ago

@djbw @juston-li Okay,im doubted why not retrieve logs from AEP directly so that we can get the detailed error information,like error type(uncorrectable,fatal). Even more detailed:DDR-T link error,Uncorrectable on AIT Read,Uncorrectable on AIT Read and so on...

Then we can do some OS operation according to different error type

djbw commented 6 years ago

OS policy tooling needs to be nvdimm generic, it can't constantly chase vendor and model specific details. This is why ndctl merges the three health reporting mechanisms from Intel, Microsoft, and HPE into a common JSON record.

insanecoderr commented 6 years ago

oh..thanks.i got it.