tinygrad / open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support
Other
772 stars 57 forks source link

'mapping of buffer object failed' error #5

Closed ferrybaltimore closed 2 months ago

ferrybaltimore commented 2 months ago

NVIDIA Open GPU Kernel Modules Version

550.54.15

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

ubuntu 23.10

Kernel Release

Linux ai5 6.5.0-28-generic

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

8x4090

Describe the bug

Seem to dtecet p2p but fails on test

./p2pBandwidthLatencyTest [P2P (Peer-to-Peer) GPU Bandwidth Latency Test] Device: 0, NVIDIA GeForce RTX 4090, pciBusID: 1, pciDeviceID: 0, pciDomainID:0 Device: 1, NVIDIA GeForce RTX 4090, pciBusID: 2, pciDeviceID: 0, pciDomainID:0 Device: 2, NVIDIA GeForce RTX 4090, pciBusID: 3, pciDeviceID: 0, pciDomainID:0 Device: 3, NVIDIA GeForce RTX 4090, pciBusID: 2c, pciDeviceID: 0, pciDomainID:0 Device: 4, NVIDIA GeForce RTX 4090, pciBusID: 41, pciDeviceID: 0, pciDomainID:0 Device: 5, NVIDIA GeForce RTX 4090, pciBusID: 42, pciDeviceID: 0, pciDomainID:0 Device: 6, NVIDIA GeForce RTX 4090, pciBusID: 61, pciDeviceID: 0, pciDomainID:0 Device: 7, NVIDIA GeForce RTX 4090, pciBusID: 62, pciDeviceID: 0, pciDomainID:0 Device=0 CAN Access Peer Device=1 Device=0 CAN Access Peer Device=2 Device=0 CAN Access Peer Device=3 Device=0 CAN Access Peer Device=4 Device=0 CAN Access Peer Device=5 Device=0 CAN Access Peer Device=6 Device=0 CAN Access Peer Device=7 Device=1 CAN Access Peer Device=0 Device=1 CAN Access Peer Device=2 Device=1 CAN Access Peer Device=3 Device=1 CAN Access Peer Device=4 Device=1 CAN Access Peer Device=5 Device=1 CAN Access Peer Device=6 Device=1 CAN Access Peer Device=7 Device=2 CAN Access Peer Device=0 Device=2 CAN Access Peer Device=1 Device=2 CAN Access Peer Device=3 Device=2 CAN Access Peer Device=4 Device=2 CAN Access Peer Device=5 Device=2 CAN Access Peer Device=6 Device=2 CAN Access Peer Device=7 Device=3 CAN Access Peer Device=0 Device=3 CAN Access Peer Device=1 Device=3 CAN Access Peer Device=2 Device=3 CAN Access Peer Device=4 Device=3 CAN Access Peer Device=5 Device=3 CAN Access Peer Device=6 Device=3 CAN Access Peer Device=7 Device=4 CAN Access Peer Device=0 Device=4 CAN Access Peer Device=1 Device=4 CAN Access Peer Device=2 Device=4 CAN Access Peer Device=3 Device=4 CAN Access Peer Device=5 Device=4 CAN Access Peer Device=6 Device=4 CAN Access Peer Device=7 Device=5 CAN Access Peer Device=0 Device=5 CAN Access Peer Device=1 Device=5 CAN Access Peer Device=2 Device=5 CAN Access Peer Device=3 Device=5 CAN Access Peer Device=4 Device=5 CAN Access Peer Device=6 Device=5 CAN Access Peer Device=7 Device=6 CAN Access Peer Device=0 Device=6 CAN Access Peer Device=1 Device=6 CAN Access Peer Device=2 Device=6 CAN Access Peer Device=3 Device=6 CAN Access Peer Device=4 Device=6 CAN Access Peer Device=5 Device=6 CAN Access Peer Device=7 Device=7 CAN Access Peer Device=0 Device=7 CAN Access Peer Device=1 Device=7 CAN Access Peer Device=2 Device=7 CAN Access Peer Device=3 Device=7 CAN Access Peer Device=4 Device=7 CAN Access Peer Device=5 Device=7 CAN Access Peer Device=6

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure. So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix D\D 0 1 2 3 4 5 6 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 3 4 5 6 7 0 913.74 6.60 6.60 10.94 10.77 10.75 10.73 10.90 1 6.51 924.56 6.23 6.51 6.51 6.51 6.51 6.51 2 6.50 6.32 923.87 6.51 6.39 6.48 6.51 6.51 3 10.87 6.61 6.60 923.60 10.81 10.77 10.58 10.80 4 9.92 6.52 6.61 10.91 924.56 10.89 10.69 10.89 5 10.91 6.61 6.60 10.95 10.91 922.92 9.02 9.59 6 10.93 6.61 6.60 10.92 10.93 10.92 921.98 10.67 7 10.90 6.61 6.60 10.94 10.92 10.88 10.53 924.01 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) Cuda failure p2pBandwidthLatencyTest.cu:189: 'mapping of buffer object failed'

To Reproduce

After installing it, p2p seem to be detected as active but when test I breaks.

./p2pBandwidthLatencyTest [P2P (Peer-to-Peer) GPU Bandwidth Latency Test] Device: 0, NVIDIA GeForce RTX 4090, pciBusID: 1, pciDeviceID: 0, pciDomainID:0 Device: 1, NVIDIA GeForce RTX 4090, pciBusID: 2, pciDeviceID: 0, pciDomainID:0 Device: 2, NVIDIA GeForce RTX 4090, pciBusID: 3, pciDeviceID: 0, pciDomainID:0 Device: 3, NVIDIA GeForce RTX 4090, pciBusID: 2c, pciDeviceID: 0, pciDomainID:0 Device: 4, NVIDIA GeForce RTX 4090, pciBusID: 41, pciDeviceID: 0, pciDomainID:0 Device: 5, NVIDIA GeForce RTX 4090, pciBusID: 42, pciDeviceID: 0, pciDomainID:0 Device: 6, NVIDIA GeForce RTX 4090, pciBusID: 61, pciDeviceID: 0, pciDomainID:0 Device: 7, NVIDIA GeForce RTX 4090, pciBusID: 62, pciDeviceID: 0, pciDomainID:0 Device=0 CAN Access Peer Device=1 Device=0 CAN Access Peer Device=2 Device=0 CAN Access Peer Device=3 Device=0 CAN Access Peer Device=4 Device=0 CAN Access Peer Device=5 Device=0 CAN Access Peer Device=6 Device=0 CAN Access Peer Device=7 Device=1 CAN Access Peer Device=0 Device=1 CAN Access Peer Device=2 Device=1 CAN Access Peer Device=3 Device=1 CAN Access Peer Device=4 Device=1 CAN Access Peer Device=5 Device=1 CAN Access Peer Device=6 Device=1 CAN Access Peer Device=7 Device=2 CAN Access Peer Device=0 Device=2 CAN Access Peer Device=1 Device=2 CAN Access Peer Device=3 Device=2 CAN Access Peer Device=4 Device=2 CAN Access Peer Device=5 Device=2 CAN Access Peer Device=6 Device=2 CAN Access Peer Device=7 Device=3 CAN Access Peer Device=0 Device=3 CAN Access Peer Device=1 Device=3 CAN Access Peer Device=2 Device=3 CAN Access Peer Device=4 Device=3 CAN Access Peer Device=5 Device=3 CAN Access Peer Device=6 Device=3 CAN Access Peer Device=7 Device=4 CAN Access Peer Device=0 Device=4 CAN Access Peer Device=1 Device=4 CAN Access Peer Device=2 Device=4 CAN Access Peer Device=3 Device=4 CAN Access Peer Device=5 Device=4 CAN Access Peer Device=6 Device=4 CAN Access Peer Device=7 Device=5 CAN Access Peer Device=0 Device=5 CAN Access Peer Device=1 Device=5 CAN Access Peer Device=2 Device=5 CAN Access Peer Device=3 Device=5 CAN Access Peer Device=4 Device=5 CAN Access Peer Device=6 Device=5 CAN Access Peer Device=7 Device=6 CAN Access Peer Device=0 Device=6 CAN Access Peer Device=1 Device=6 CAN Access Peer Device=2 Device=6 CAN Access Peer Device=3 Device=6 CAN Access Peer Device=4 Device=6 CAN Access Peer Device=5 Device=6 CAN Access Peer Device=7 Device=7 CAN Access Peer Device=0 Device=7 CAN Access Peer Device=1 Device=7 CAN Access Peer Device=2 Device=7 CAN Access Peer Device=3 Device=7 CAN Access Peer Device=4 Device=7 CAN Access Peer Device=5 Device=7 CAN Access Peer Device=6

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure. So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix D\D 0 1 2 3 4 5 6 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 3 4 5 6 7 0 913.74 6.60 6.60 10.94 10.77 10.75 10.73 10.90 1 6.51 924.56 6.23 6.51 6.51 6.51 6.51 6.51 2 6.50 6.32 923.87 6.51 6.39 6.48 6.51 6.51 3 10.87 6.61 6.60 923.60 10.81 10.77 10.58 10.80 4 9.92 6.52 6.61 10.91 924.56 10.89 10.69 10.89 5 10.91 6.61 6.60 10.95 10.91 922.92 9.02 9.59 6 10.93 6.61 6.60 10.92 10.93 10.92 921.98 10.67 7 10.90 6.61 6.60 10.94 10.92 10.88 10.53 924.01 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) Cuda failure p2pBandwidthLatencyTest.cu:189: 'mapping of buffer object failed'

Bug Incidence

Always

nvidia-bug-report.log.gz

./p2pBandwidthLatencyTest [P2P (Peer-to-Peer) GPU Bandwidth Latency Test] Device: 0, NVIDIA GeForce RTX 4090, pciBusID: 1, pciDeviceID: 0, pciDomainID:0 Device: 1, NVIDIA GeForce RTX 4090, pciBusID: 2, pciDeviceID: 0, pciDomainID:0 Device: 2, NVIDIA GeForce RTX 4090, pciBusID: 3, pciDeviceID: 0, pciDomainID:0 Device: 3, NVIDIA GeForce RTX 4090, pciBusID: 2c, pciDeviceID: 0, pciDomainID:0 Device: 4, NVIDIA GeForce RTX 4090, pciBusID: 41, pciDeviceID: 0, pciDomainID:0 Device: 5, NVIDIA GeForce RTX 4090, pciBusID: 42, pciDeviceID: 0, pciDomainID:0 Device: 6, NVIDIA GeForce RTX 4090, pciBusID: 61, pciDeviceID: 0, pciDomainID:0 Device: 7, NVIDIA GeForce RTX 4090, pciBusID: 62, pciDeviceID: 0, pciDomainID:0 Device=0 CAN Access Peer Device=1 Device=0 CAN Access Peer Device=2 Device=0 CAN Access Peer Device=3 Device=0 CAN Access Peer Device=4 Device=0 CAN Access Peer Device=5 Device=0 CAN Access Peer Device=6 Device=0 CAN Access Peer Device=7 Device=1 CAN Access Peer Device=0 Device=1 CAN Access Peer Device=2 Device=1 CAN Access Peer Device=3 Device=1 CAN Access Peer Device=4 Device=1 CAN Access Peer Device=5 Device=1 CAN Access Peer Device=6 Device=1 CAN Access Peer Device=7 Device=2 CAN Access Peer Device=0 Device=2 CAN Access Peer Device=1 Device=2 CAN Access Peer Device=3 Device=2 CAN Access Peer Device=4 Device=2 CAN Access Peer Device=5 Device=2 CAN Access Peer Device=6 Device=2 CAN Access Peer Device=7 Device=3 CAN Access Peer Device=0 Device=3 CAN Access Peer Device=1 Device=3 CAN Access Peer Device=2 Device=3 CAN Access Peer Device=4 Device=3 CAN Access Peer Device=5 Device=3 CAN Access Peer Device=6 Device=3 CAN Access Peer Device=7 Device=4 CAN Access Peer Device=0 Device=4 CAN Access Peer Device=1 Device=4 CAN Access Peer Device=2 Device=4 CAN Access Peer Device=3 Device=4 CAN Access Peer Device=5 Device=4 CAN Access Peer Device=6 Device=4 CAN Access Peer Device=7 Device=5 CAN Access Peer Device=0 Device=5 CAN Access Peer Device=1 Device=5 CAN Access Peer Device=2 Device=5 CAN Access Peer Device=3 Device=5 CAN Access Peer Device=4 Device=5 CAN Access Peer Device=6 Device=5 CAN Access Peer Device=7 Device=6 CAN Access Peer Device=0 Device=6 CAN Access Peer Device=1 Device=6 CAN Access Peer Device=2 Device=6 CAN Access Peer Device=3 Device=6 CAN Access Peer Device=4 Device=6 CAN Access Peer Device=5 Device=6 CAN Access Peer Device=7 Device=7 CAN Access Peer Device=0 Device=7 CAN Access Peer Device=1 Device=7 CAN Access Peer Device=2 Device=7 CAN Access Peer Device=3 Device=7 CAN Access Peer Device=4 Device=7 CAN Access Peer Device=5 Device=7 CAN Access Peer Device=6

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure. So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix D\D 0 1 2 3 4 5 6 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 3 4 5 6 7 0 913.74 6.60 6.60 10.94 10.77 10.75 10.73 10.90 1 6.51 924.56 6.23 6.51 6.51 6.51 6.51 6.51 2 6.50 6.32 923.87 6.51 6.39 6.48 6.51 6.51 3 10.87 6.61 6.60 923.60 10.81 10.77 10.58 10.80 4 9.92 6.52 6.61 10.91 924.56 10.89 10.69 10.89 5 10.91 6.61 6.60 10.95 10.91 922.92 9.02 9.59 6 10.93 6.61 6.60 10.92 10.93 10.92 921.98 10.67 7 10.90 6.61 6.60 10.94 10.92 10.88 10.53 924.01 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) Cuda failure p2pBandwidthLatencyTest.cu:189: 'mapping of buffer object failed'

More Info

No response

ferrybaltimore commented 2 months ago

Solved, it because the bar support on bios.