Open Chancel-xFusion opened 1 year ago
The code in efi_get_memory_map got complicated when one of our engineers added a workaround for a system that doesn't return a value in DescriptorSize when returning EFI_BUFFER_TOO_SMALL. Previously the loop was much simpler; see the older code at https://github.com/vmware/esx-boot/blob/52bdb5059a46c6c35af5fd8c042ae91db0fa6699/uefi/efiutils/memory.c
I can't say I fully understand why an infinite loop occurs with your system. It looks like the problem basically is that the allocation and freeing that our code is doing, while trying to get a buffer that is sufficiently larger than the memory map for our purposes, can sometimes create a pattern where the memory map bounces back and forth between two sizes that differ by more than a factor of 2. I am not totally sure whether even the older, simpler code would necessarily be free from the danger of this happening.
I suspect there is a way to rewrite our code that would both make it simpler and make it robust against this issue. Not sure when I would personally find time to work on it, though. If you have the ability to file SRs or DCPN cases, that's a better way of reporting this issue than filing a bug in github on the open source release of the esx-boot. Then it can go through our regular process for product code and get someone assigned to work on it.
Here are a few ideas I have that may help improve our code:
(1) Remember the largest MemoryMapSize that has been returned so far and never request less than that, regardless of whether a retry returned a smaller size.
(2) To better work around implementations that don't set DescriptorSize: (a) Assume this value doesn't change dynamically (it shouldn't, since it depends only on DescriptorVersion) , so if GetMemoryMap has ever succeeded, assume DescriptorSize continues to be the size that was returned then. (b) If GetMemoryMap fails the first time, use a reasonable guess for DescriptorSize in the subsequent computation of how much bigger a buffer to allocate -- namely, sizeof EFI_MEMORY_DESCRIPTOR version 1. This will usually (in practice always, because there has only been one version so far) result in allocating a big enough buffer on the next try.
Thank you for your reply. I will open a SR case to request support.
Let me know what the SR number is, in case it gets stuck with support and they don't file a bugzilla ticket with engineering.
Sorry. I only have the permission to submit certification requests. I submitted the certification request. The reply was "As per our engineering team, you need to file this issue as an interop issue, not in the server_CR project."
Were you able to file it somewhere, then? I don't know what they mean by "file this issue as an interop issue". Would that be in DCPN? Or did you take that to mean you can't file it anywhere? Also, I am wondering who said that and who in engineering they are referring to. If you have name(s), I can talk to the person/people from my side.
Describe the bug
There are three calls to GetMemoryMap in efi_get_memory_map. The corresponding execution function in EDKII is CoreGetMemoryMap. If the BIOS returns the following value, an endless loop occurs.
At this point, in the efi_get_memory_map function, the loop is re-entered because Status == EFI_BUFFER_TOO_SMALL.
In contrast to Linux, GetMemoryMap is called only twice.
This problem may occur because EDKII calls MergeMemoryMap in CoreGetMemoryMap. What are some good solutions to this problem?
Reproduction steps
After upgrading to the latest BIOS firmware on the newly released 2288H V7 product of xFusion. Run the power cycle 2 or 3 times or so.
Expected behavior
ESXi can start properly.
Additional context
No response