thebel1 / thpimon

Native ESXi on Arm hardware status driver for the Raspberry Pi.
1 stars 0 forks source link

Making repeated VC mailbox requests will result in response being empty #3

Closed thebel1 closed 3 years ago

thebel1 commented 3 years ago

This only recovers after a reboot.

i=0; while :; do echo "[${i}]"; /scratch/downloads/pyUtil/pimon_util.py; i=$((${i} + 1)); done;
[...]
[18]
Firmware Revision:      0x5f440c10
Board Model:            0
Board Revision:         0xc03111
Board MAC Address:      d6:51:4:32:a6:dc
Board Serial:           0x000000664b0067
Temp:                   60.0 (deg. C)
[19]
[Errno 16] Device or resource busy
[Errno 22] Invalid argument
Firmware Revision:      0x0
[Errno 22] Invalid argument
[...]

Log:

2020-09-01T14:13:08.380Z cpu2:67963)thpimon: rpiq_mmioIoctlCB:401: bufLen=32 rqRp=0 tag=0x10003 rpLen=6 rqLen=0 endTag=0 padding[0]=0 padding[1]=0
2020-09-01T14:13:08.380Z cpu2:67963)thpimon: rpiq_mboxSend:121: allocating from heap 0x430073c00000, size 32, alignment 16
2020-09-01T14:13:08.380Z cpu2:67963)thpimon: rpiq_mboxSend:146: writing mboxOut 0xe651228 to DMA buffer 0x430073c01220 (MA 0xe651220) on channel 8
2020-09-01T14:13:08.380Z cpu2:67963)WARNING: thpimon: rpiq_mboxSend:203: RPIQ mailbox not full after 4096 retries
2020-09-01T14:13:08.380Z cpu2:67963)WARNING: thpimon: rpiq_mmioIoctlCB:413: unable to send to mailbox: Busy
2020-09-01T14:13:08.380Z cpu2:67963)WARNING: thpimon: pimon_charDevIoctl:506: ioctl cmd 8 with data 0x31b96260fc8 failed: Busy
2020-09-01T14:13:08.423Z cpu2:67963)thpimon: pimon_charDevFileDestroy:737: destroying file 0x4300740012f0
thebel1 commented 3 years ago

The cause is the DMA buffer being allocated above 2GB:

2020-09-01T14:15:35.920Z cpu0:67976)thpimon: rpiq_mboxSend:147: writing mboxOut 0xe651228 to DMA buffer 0x430076601220 (MA 0xe651220) on channel 8
(gdb) p/d 0xe651220 > (1<<31)
$3 = 1
thebel1 commented 3 years ago

Never mind, I just re-used the same allocation 353 times and the VC still stopped sending a response. Might be a firmware bug.

thebel1 commented 3 years ago

Issue was caused by build-up of mailbox buffers and a CPU caching issue. Fixed in this commit:

https://github.com/thebel1/thpimon/commit/a8a44e9c2841ea3e105dcb2a93c93bfe5998dccc

thebel1 commented 3 years ago

Evidence:

Ran the pimon_util.py script 20k times:

i=0; while :; do echo "[${i}]"; /scratch/downloads/pyUtil/pimon_util.py; i=$((${i} + 1)); done | tee /tmp/test.txt

Results:

cat /tmp/test.txt | awk -e 'BEGIN{ errcount=0; totalcount=0; } /timed out/{ errcount += 1; } /\[[0-9]+\]/{ totalcount += 1; } END{ print "err:\t"
, errcount; print "total:\t", totalcount; print "ratio:\t", (errcount/totalcount); }'
err:     66
total:   20015
ratio:   0.00329753

Failure rate of 0.3%. If this becomes a problem, the retry count can be increased further.

vMarkus commented 3 years ago

24hrs testing and no issues - great work!