xCuri0 / ReBarUEFI

Resizable BAR for (almost) any UEFI system
MIT License
1.29k stars 57 forks source link

LGA1151 PCI allocation limited to 64GB #163

Open crashr opened 2 months ago

crashr commented 2 months ago

Hi. I have a MSI H310M PRO-VHD Mainboard with a i5-8500 and 8GB RAM which works fine with a single NVIDIA Tesla P40 under Ubuntu 22.04 but doesn't POST with 2 or 3 P40. There are some forum posts where someone wrote to the MSI support and they said that they never testet P40 on that mainboard. Is there a chance to get more than one GPU running by BIOS patching? Is this actually a BAR issue?

xCuri0 commented 2 months ago

You're going to need to create a new UEFI patch for this. The hardware (anything Haswell or newer) is capable but for some reason OEMs don't use the full addressing capability of the CPU. It seems like they only use 64GB on LGA1151 which would only be able to support 1x P40 in combination with internal PCIe devices.

This screenshot might help you, you need to patch the AddMemorySpace call in PciHostBridge to increase its size from 64GB to something like 256GB.

image

xCuri0 commented 2 months ago

I suggest you go with 0x2100000000 for base and 0x5000000000 for size

crashr commented 2 months ago

Okay. At the moment I am still trying to figure out how to get to the code from what I extracted with UEFITool. I used some of these tools a year or so ago when you helped me with a P40 on another board and I still have the files from all the steps but I don't remember how I got there.

xCuri0 commented 2 months ago

@crashr it's Ghidra disassembler with efiSeek plugin. You need to use it on PCIhostbridge extracted PE32

crashr commented 2 months ago

Wow, i didn't know about that software. But I have trouble to identify the function call. Tbh I am not sure what actually looking for. I inspected all found functions and labels manually but wasn't able to spot something related. Is this maybe something I am looking for?

image

At least this function is called very early in the program flow on top in the call hierachy.

image

This is the code clostest to that in your screenshot that I could find.

image

xCuri0 commented 2 months ago

@crashr try searching for 0x2000000000 or 0x1000000000

crashr commented 2 months ago

Facepalm. So obviosly to search for 0x1000000000. Ok, but now I think I know what to do. Please correct me if I am wrong. There are thee occurences of each. I read the documentatin on AddMemorySpace but I don't find it anymore. But I remember first argument was memory type and 0x3 is MMIO. Offset in the structure which holds the config is 0x18. Because of all this assume I have to change these two values:

image

To be on the save side I wait for an answer before doing this. In the mean time I'm going to find out how to make a patch of this.

xCuri0 commented 2 months ago

@crashr yes that's correct.

Change all 3 instances to the values I told (0x2100000000 and 0x5000000000)

First value is mmio base and second value is size. It should give you 320GB of BAR space.

xCuri0 commented 2 months ago

For patch you just need to compare in hex editor after saving the modified PE32 in Ghidra

crashr commented 2 months ago

Do I actually really need the patch or could I just add the modified EFI module to the UEFI file? The point is that I don't know how to bring the patch in a format like in the patches.txt from this repo. For example

8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772 10 P:040000004823C1483BC2480F47D04C2BC27411:100000004823C1483BC2480F47D04C2BC26690

I read somewhere how the format works but how to know the string "8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772"?

crashr commented 2 months ago

Btw did Ghidra add some text to the file. Looks wrong to me. Have I maybe done something wrong? Should I remove it from the binary?

$ ls -l E7C09IMS.181.PciHostBridge.efi.unpatched E7C09IMS.181.PciHostBridge.efi.patched | awk '{print $5, $9}'
25382 E7C09IMS.181.PciHostBridge.efi.patched
24544 E7C09IMS.181.PciHostBridge.efi.unpatched
$ diff <(xxd E7C09IMS.181.PciHostBridge.efi.unpatched) <(xxd E7C09IMS.181.PciHostBridge.efi.patched)
298,301c298,301
< 00001290: 0000 2000 0000 b903 0000 0049 b800 0000  .. ........I....
< 000012a0: 0010 0000 00ff 5018 488b 0521 4900 004c  ......P.H..!I..L
< 000012b0: 8bc7 48ba 0000 0000 1000 0000 48b9 0000  ..H.........H...
< 000012c0: 0000 2000 0000 ff50 4048 3bc5 7538 4088  .. ....P@H;.u8@.
---
> 00001290: 0000 2100 0000 b903 0000 0049 b800 0000  ..!........I....
> 000012a0: 0050 0000 00ff 5018 488b 0521 4900 004c  .P....P.H..!I..L
> 000012b0: 8bc7 48ba 0000 0000 5000 0000 48b9 0000  ..H.....P...H...
> 000012c0: 0000 2100 0000 ff50 4048 3bc5 7538 4088  ..!....P@H;.u8@.
303,304c303,304
< 000012e0: 00ff 48b8 0000 0000 2000 0000 4889 0545  ..H..... ...H..E
< 000012f0: 4900 0048 b800 0000 0010 0000 0048 8905  I..H.........H..
---
> 000012e0: 00ff 48b8 0000 0000 2100 0000 4889 0545  ..H.....!...H..E
> 000012f0: 4900 0048 b800 0000 0050 0000 0048 8905  I..H.....P...H..
1383,1384c1383,1384
< 00005660: a1e5 3f3e 36b2 0da9 e42a 0000 0000 0000  ..?>6....*......
< 00005670: ec49 0000 0000 0000 0000 0000 0000 0000  .I..............
---
> 00005660: a1e5 3f3e 36b2 0da9 e42a 0080 0000 0000  ..?>6....*......
> 00005670: ec49 0080 0000 0000 0000 0000 0000 0000  .I..............
1534a1535,1587
> 00005fe0: 7b22 696e 7374 616c 6c20 7072 6f74 6f63  {"install protoc
> 00005ff0: 6f6c 223a 7b22 3234 3335 223a 7b22 6e61  ol":{"2435":{"na
> 00006000: 6d65 223a 2245 4649 5f50 4349 5f48 4f53  me":"EFI_PCI_HOS
> 00006010: 545f 4252 4944 4745 5f52 4553 4f55 5243  T_BRIDGE_RESOURC
> 00006020: 455f 414c 4c4f 4341 5449 4f4e 5f50 524f  E_ALLOCATION_PRO
> 00006030: 544f 434f 4c22 2c22 6775 6964 223a 2263  TOCOL","guid":"c
> 00006040: 6638 3033 3462 652d 3637 3638 2d34 6438  f8034be-6768-4d8
> 00006050: 622d 6237 3339 2d37 6363 6536 3833 6139  b-b739-7cce683a9
> 00006060: 6662 6522 2c22 6675 6e63 7469 6f6e 206e  fbe","function n
> 00006070: 616d 6522 3a22 4655 4e5f 3830 3030 3038  ame":"FUN_800008
> 00006080: 3834 227d 7d2c 2269 6e74 6572 7275 7074  84"}},"interrupt
> 00006090: 7322 3a7b 2273 7753 6d69 223a 7b7d 2c22  s":{"swSmi":{},"
> 000060a0: 6877 536d 6922 3a7b 7d2c 2263 6869 6c64  hwSmi":{},"child
> 000060b0: 223a 7b7d 7d2c 226c 6f63 6174 6520 7072  ":{}},"locate pr
> 000060c0: 6f74 6f63 6f6c 223a 7b22 3736 3222 3a7b  otocol":{"762":{
> 000060d0: 226e 616d 6522 3a22 4546 495f 534d 4d5f  "name":"EFI_SMM_
> 000060e0: 4241 5345 325f 5052 4f54 4f43 4f4c 222c  BASE2_PROTOCOL",
> 000060f0: 2267 7569 6422 3a22 6634 6363 6266 6237  "guid":"f4ccbfb7
> 00006100: 2d66 3665 302d 3437 6664 2d39 6464 342d  -f6e0-47fd-9dd4-
> 00006110: 3130 6138 6631 3530 6331 3931 222c 2266  10a8f150c191","f
> 00006120: 756e 6374 696f 6e20 6e61 6d65 223a 224d  unction name":"M
> 00006130: 6f64 756c 6545 6e74 7279 506f 696e 7422  oduleEntryPoint"
> 00006140: 7d2c 2232 3937 3222 3a7b 226e 616d 6522  },"2972":{"name"
> 00006150: 3a22 4546 495f 4d45 5452 4f4e 4f4d 455f  :"EFI_METRONOME_
> 00006160: 4152 4348 5f50 524f 544f 434f 4c22 2c22  ARCH_PROTOCOL","
> 00006170: 6775 6964 223a 2232 3662 6163 6362 322d  guid":"26baccb2-
> 00006180: 3666 3432 2d31 3164 342d 6263 6537 2d30  6f42-11d4-bce7-0
> 00006190: 3038 3063 3733 6338 3838 3122 2c22 6675  080c73c8881","fu
> 000061a0: 6e63 7469 6f6e 206e 616d 6522 3a22 4655  nction name":"FU
> 000061b0: 4e5f 3830 3030 3038 3834 227d 2c22 3230  N_80000884"},"20
> 000061c0: 3536 223a 7b22 6e61 6d65 223a 2253 415f  56":{"name":"SA_
> 000061d0: 474c 4f42 414c 5f4e 5653 5f41 5245 415f  GLOBAL_NVS_AREA_
> 000061e0: 5052 4f54 4f43 4f4c 222c 2267 7569 6422  PROTOCOL","guid"
> 000061f0: 3a22 3364 6332 3164 3735 2d64 6530 652d  :"3dc21d75-de0e-
> 00006200: 3433 3030 2d61 3061 612d 3139 6334 3163  4300-a0aa-19c41c
> 00006210: 3063 6633 6466 222c 2266 756e 6374 696f  0cf3df","functio
> 00006220: 6e20 6e61 6d65 223a 2246 554e 5f38 3030  n name":"FUN_800
> 00006230: 3030 3765 3822 7d2c 2233 3030 3122 3a7b  007e8"},"3001":{
> 00006240: 226e 616d 6522 3a22 4546 495f 4350 555f  "name":"EFI_CPU_
> 00006250: 494f 325f 5052 4f54 4f43 4f4c 222c 2267  IO2_PROTOCOL","g
> 00006260: 7569 6422 3a22 6164 3631 6631 3931 2d61  uid":"ad61f191-a
> 00006270: 6535 662d 3463 3065 2d62 3966 612d 6538  e5f-4c0e-b9fa-e8
> 00006280: 3639 6432 3838 6336 3466 222c 2266 756e  69d288c64f","fun
> 00006290: 6374 696f 6e20 6e61 6d65 223a 2246 554e  ction name":"FUN
> 000062a0: 5f38 3030 3030 3838 3422 7d2c 2231 3032  _80000884"},"102
> 000062b0: 3334 223a 7b22 6e61 6d65 223a 2275 6e6b  34":{"name":"unk
> 000062c0: 6e6f 776e 5072 6f74 6f63 6f6c 5f34 6663  nownProtocol_4fc
> 000062d0: 3037 3333 6622 2c22 6775 6964 223a 2234  0733f","guid":"4
> 000062e0: 6663 3037 3333 662d 3666 6432 2d34 3931  fc0733f-6fd2-491
> 000062f0: 622d 6138 3930 2d35 3337 3435 3231 6266  b-a890-5374521bf
> 00006300: 3438 6622 2c22 6675 6e63 7469 6f6e 206e  48f","function n
> 00006310: 616d 6522 3a22 4655 4e5f 3830 3030 3237  ame":"FUN_800027
> 00006320: 6434 227d 7d7d                           d4"}}}
xCuri0 commented 2 months ago

@crashr you need to export as PE in Ghidra, the file sizes should be exactly the same as original

xCuri0 commented 2 months ago

And yes for now you can just do PE32 replace in UEFITool.

Once it's tested and working we can make a UEFIpatch entry

crashr commented 2 months ago

Sounds good. Will try.

crashr commented 2 months ago

Bad luck. Doesn't POST at all. EFI module looks liek this:

$ diff <(xxd -p E7C09IMS.181.PciHostBridge.efi.unpatched) <(xxd -p E7C09IMS.181.PciHostBridge.efi.patched)
159,162c159,162
< 05474900004c8bcf48ba0000000020000000b90300000049b80000000010
< 000000ff5018488b05214900004c8bc748ba000000001000000048b90000
< 000020000000ff5040483bc5753840883d7b490000483bf5740848838e48
< 010000ff48b800000000200000004889054549000048b800000000100000
---
> 05474900004c8bcf48ba0000000021000000b90300000049b80000000050
> 000000ff5018488b05214900004c8bc748ba000000005000000048b90000
> 000021000000ff5040483bc5753840883d7b490000483bf5740848838e48
> 010000ff48b800000000210000004889054549000048b800000000500000
738c738
< b54ba1e53f3e36b20da9e42a000000000000ec4900000000000000000000
---
> b54ba1e53f3e36b20da9e42a008000000000ec4900800000000000000000

The BIOS file seemed a bit unusual to me but I assumed that has to do with something UEFITool does:

$ diff <(xxd -p E7C09IMS.181) <(xxd -p E7C09IMS.181.patched) | wc -l
359902

So seems I first need to unbrick the board somehow.

xCuri0 commented 2 months ago

Maybe 320GB is too much you could try with smaller size next time.

Or maybe it's unrelated to the patch and POST issue is caused by UEFITool messing with pad files.

crashr commented 2 months ago

I was able to flash the original firmware, board is up and running again. Since right now I still have rpi and wires connected for using flashrom I can experiment a little bit.

crashr commented 2 months ago

Seems to be UEFITool. I extracted PciHostBridge and reinserted it unmodified. The resulting Image differs from the original image.

ghost commented 2 months ago

Or maybe it's unrelated to the patch and POST issue is caused by UEFITool messing with pad files.

IMHO most likely.

The resulting Image differs from the original image.

Check the padded area. If compressed file looks different it may just be a difference in compression algo, as long as it decompresses Okay should be fine.

@crashr If using UEFITool then try just operating on the volume with the file to be modded instead of the whole firmware image unless there's 'non-empty' padding there too.

crashr commented 2 months ago

I did not see any differences in the padding sections. Also I tried with fiano with more or less same results. I will try again the next days when I have time.

crashr commented 2 months ago

What is actually the reason for changing the base to 0x2100000000? Why not leaving it at 0x2000000000?

xCuri0 commented 2 months ago

It's not needed yeah

crashr commented 2 months ago

No success so far. I tried 0x2000000000 as size. The diff looks like this:

$ diff <(xxd E7C09IMS.181.PciHostBridge.efi.unpatched) <(xxd E7C09IMS.181.PciHostBridge.efi.patched)
299,300c299,300
< 000012a0: 0010 0000 00ff 5018 488b 0521 4900 004c  ......P.H..!I..L
< 000012b0: 8bc7 48ba 0000 0000 1000 0000 48b9 0000  ..H.........H...
---
> 000012a0: 0020 0000 00ff 5018 488b 0521 4900 004c  . ....P.H..!I..L
> 000012b0: 8bc7 48ba 0000 0000 2000 0000 48b9 0000  ..H..... ...H...
304c304
< 000012f0: 4900 0048 b800 0000 0010 0000 0048 8905  I..H.........H..
---
> 000012f0: 4900 0048 b800 0000 0020 0000 0048 8905  I..H..... ...H..
1383,1384c1383,1384
< 00005660: a1e5 3f3e 36b2 0da9 e42a 0000 0000 0000  ..?>6....*......
< 00005670: ec49 0000 0000 0000 0000 0000 0000 0000  .I..............
---
> 00005660: a1e5 3f3e 36b2 0da9 e42a 0080 0000 0000  ..?>6....*......
> 00005670: ec49 0080 0000 0000 0000 0000 0000 0000  .I..............

But PC just doesn't POST with a modified module. But it POSTs if I replace the module by the unmodified module so can I assume that what I want is just not possible?

ghost commented 2 months ago

I did not see any differences in the padding sections. Also I tried with fiano with more or less same results. I will try again the next days when I have time.

Strange, if I look at your BIOS with UEFITool I get this. ne

First time hearing of fiano, so some other issue then I guess. Hope you get it resolved. :crossed_fingers:

crashr commented 2 months ago
$ xxd nonuefi-data 
00000000: ead0 ff00 f000 0000 0000 0000 0000 272d  ..............'-
00000010: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000020: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000030: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000040: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000050: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000060: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000070: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000080: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000090: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000000a0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000000b0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000000c0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000000d0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000000e0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000000f0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000100: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000110: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000120: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000130: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000140: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000150: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000160: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000170: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000180: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000190: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000001a0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000001b0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000001c0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000001d0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000001e0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
000001f0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000200: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000210: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000220: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000230: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000240: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000250: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000260: ffff ffff ffff ffff                      ........

image

https://forums.overclockers.ru/viewtopic.php?f=1&t=479847&start=9860

Seems to be this.

dsanke commented 2 months ago
# PciHostBridge
8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772 10 P:48BA0000000020000000:............40...... 
8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772 10 P:48B90000000020000000:............40...... 
8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772 10 P:48B80000000020000000:............40...... 
8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772 10 P:49B80000000010000000:............20...... 
8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772 10 P:48BA0000000010000000:............20...... 
8D6756B9-E55E-4D6A-A3A5-5E4D72DDF772 10 P:48B80000000010000000:............20...... 

i made this patch for supporting 128G DRAM on Clevo LGA1151 Z170/Z270/Z370 notebooks. it should works for your situation too. and here is the patched bios: E7C09IMS.181.zip

xCuri0 commented 2 months ago

@dsanke this also increases the size from 64GB to 128GB am I correct ?

should be good for 3x NVIDIA P40

dsanke commented 2 months ago

@xCuri0 this patch resolve "pci resources error" when installing 128G DRAM. make BIOS allow 128G DRAM is not this patch. it should set "base" to 128G instead of 64G and set "size" to 256G instead of 128G in "AddMemorySpace".

crashr commented 2 months ago

@dsanke I am pretty new to this. Could you please explain why to choose base and size exactly like this?

dsanke commented 2 months ago

@crashr for the value, i just double them. if dont enough, i will double them again. i am not eager to find the upper limit. and if you ask how i find these hex pattern, well i have a little experience on reserve or disassembly. when @xCuri0 show me that screenshot, i search immediate value, and locate it. you can use ghidra or ida to do that. extract pe32 body via uefitool from bios, then disassembly, and find the value you should change. then alter the value, get bios modified, flash and test.

crashr commented 2 months ago

@dsanke I am not sure if I got you right. You modified AddMemorySpace, right? Your modification allows to use 128 GB of system RAM but according to my tests still doesn't allow to allocate more than 64 GB address space to PCI. Since there are no other occurencies of 0x1000000000 or 0x2000000000 as to my understandings it is not possible to achieve that.

dsanke commented 2 months ago

@crashr so you still can not work with more than one p40. we need dive deeper. the patch i post is solving "pci resources error" showing during post with more than 64gb dram, without this patch, bios will show a alert page and pause boot into os.

i have no clevo machine handy, so i can not determine the pci resource address changes. will ask a friend to test soon.

i compared the clevo machine, which have large memory 256G to 384G for pcie root complex showing in windows device manager after modification.