Closed lxwinspur closed 1 year ago
@Emy-inspur FYI
@mzipse Please let your host team take a look at this issue. thanks!
@dhruvibm , can you comment on how IPS might debug the dump fails? Perhaps logging in via the Service Account and then what to look for?
To debug the PHYP hang I'm wondering if the IPS team is familiar with using isteps? I believe Istep mode is similar to P9 but you could then stop at the istep just before the hang and then look at what HDAT data is getting passed to PHYP.
Sorry, didn't mean to close this issue.
@dhruvibm
The value of hdatSystemVendorName
printed is the combined value of F5 and F6 before entering the PHYP.
@mzipse Now, the problem is that we cannot build a firmware that can boot phyp success, even if we do not add the sms-related modifications. So, i hope you can provide a detailed explanation on how to build a firmware that can successfully boot phyp based on open source code.
@mzipse @dhruvibm From the discussion here, IPS knows how to debug using isteps. But seems system hung after handed over to PHYP. Could you help confirm the value is correct if splice F5 and F6 together for hdatSystemVendorName
?
@edwin-wang @mzipse @dhruvibm Let's synchronize the information.
I don't think it's necessarily the SMS modification that caused the problem, but it's probably the method we built hostfw is incorrect .
A true "hang" is rare so I suspect there is a TI or checkstop happening. Can we get a BMC dump? Or failing that at least the peltool output of all visible logs. There should be a log that includes the TI SRC and/or the checkstop reason.
We noticed A7004714 in the output.
Explanation Platform LIC has detected a new VPD card. Response
This could be preventing PHYP standby. You will need to apply the appropriate license keys on your system.
@dcrowell77 Thank you for your answer. The event logs and BMC dump we obtainted are as follows, Please take a look. https://github.com/Emy-inspur/SMS-Logs.git Also, how can I obtain or generate the appropriate license keys?
Email sent to Xujin on the procedure for clearing license keys and using IPS activation codes.
Also, per feedback from Uma, you should consider setting the time to aid in future debugging using dumps. And lastly, we noticed some resources have been guarded out. You should consider clearing guard (guard -r).
An A7004714 does NOT necessarily require ANY action. It only means when phyp came up, there was no COD information (activations) found to be stored in the server yet -- at the very WORST, we'd come up with 1 processor and some memory available -- the 4714 is NOT an IPL-blocker.
I'm sure there will be more discussion at the meeting, but likely something else is not satisfied, thus the IPL cannot go from C7004091 to "Standby/Runtime". Absence of COD activations alone will NOT block an IPL from completing.
There was a request from Travis from PHYP team to have one HDAT change to enaable the flag System Security Settings (it 2 = 1: Platform security overrides allowed).
Below is the change to be applied for the same: *** hdatiplparms.C 736 // by a service processor 737 this->iv_hdatIPLParams->iv_sysParms.hdatSysSecuritySetting = 0; 738
---> New two lines to be added 739 // Set the Bit 2 for Platform security overrides 740 this->iv_hdatIPLParams->iv_sysParms.hdatSysSecuritySetting != 0x20000;
Do you mean to add this line? this->iv_hdatIPLParams->iv_sysParms.hdatSysSecuritySetting |= 0x2000;
If I understand correctly, it seems to have no effect. Same as previous tests: The host console stop at C7004091,and there is no output on Hypervisor console.
I believe IBM team figured out some other issue with the HBRT lids. So doing the above HDAT change makes no sense now. Please ignore my HDAT fix suggestion.
Yes, i get the email. Thank you for your reply anyway.
@lxwinspur , I think we can close this issue now, correct? With an updated step dealing with the LIDs in the Host firmware build process, I think this was resolved.
Problem Description
Using Hostfw compiled by IPS, the machine cannot boot to PHYP, and the Host console stays at the C7004091 interface
Hostboot
https://github.com/open-power/hostboot
Branch
master-p10
CommitID
559907d6676d180c693afaa9248e94e83abc7553
Host Console:
Also, An error will be displayed when collecting system dump and BMC dump: