Open JsBergbau opened 4 years ago
You haven't quoted the actual kernel dump from the error, nor what the connected device is.
When I try this I get
OSError: [Errno 121] Remote I/O error
which is what I expect as there isn't a device connected. Providing the address of an IMX219 camera module and I don't see any bad behaviour, however it's an I2C device rather than SMBus.
ou haven't quoted the actual kernel dump from the error
Where is it stored? I couldn't find. I've only found that you should connect a serial console to get the full dump.
Connected is a OPT3001 https://www.aliexpress.com/item/33049813978.html Sensor. I've created another script with smbus (not smbus2) and there it is working. Still a userspace application shouldn't cause a kernel panic.
Where is it stored? I couldn't find. I've only found that you should connect a serial console to get the full dump.
It'll normally be dumped to the screen, or to the SSH session that tripped it up.
Very few kernel crashes take out everything, so having two SSH sessions open with one running dmesg -w
and the other running your script would normally work.
Connected is a OPT3001 https://www.aliexpress.com/item/33049813978.html Sensor. I've created another script with smbus (not smbus2) and there it is working. Still a userspace application shouldn't cause a kernel panic.
Agreed, but without any debug it's near impossible to determine what is going wrong.
It'll normally be dumped to the screen, or to the SSH session that tripped it up. Very few kernel crashes take out everything, so having two SSH sessions open with one running dmesg -w and the other running your script would normally work.
Saidly no. SSH session just crashes. Even the trick with the second ssh session didn't work. I've setup another Raspberry-PI to get the full kernel dump. BTW: After the crash even the last modified files aren't saved to disk.
raspberrypi login: [ 154.872851] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: i2cdev_ioctl_smbus+0x268/0x268 [i2c_dev]
[ 154.886523] CPU: 1 PID: 1333 Comm: crash.sh Tainted: G C 4.19.93-v7+ #1290
[ 154.896900] Hardware name: BCM2835
[ 154.901413] [<801120c0>] (unwind_backtrace) from [<8010d5f4>] (show_stack+0x20/0x24)
[ 154.911513] [<8010d5f4>] (show_stack) from [<808453e8>] (dump_stack+0xe0/0x124)
[ 154.920155] [<808453e8>] (dump_stack) from [<80120ef8>] (panic+0x104/0x288)
[ 154.928453] [<80120ef8>] (panic) from [<801209f8>] (print_tainted+0x0/0xa8)
[ 154.936788] CPU0: stopping
[ 154.940794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 4.19.93-v7+ #1290
[ 154.951530] Hardware name: BCM2835
[ 154.956302] [<801120c0>] (unwind_backtrace) from [<8010d5f4>] (show_stack+0x20/0x24)
[ 154.966926] [<8010d5f4>] (show_stack) from [<808453e8>] (dump_stack+0xe0/0x124)
[ 154.975805] [<808453e8>] (dump_stack) from [<801104d8>] (handle_IPI+0x388/0x3a8)
[ 154.986153] [<801104d8>] (handle_IPI) from [<801021f4>] (bcm2836_arm_irqchip_handle_irq+0xa0/0xa4)
[ 154.998240] [<801021f4>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0x7c)
[ 155.010254] Exception stack(0x80d01ee8 to 0x80d01f30)
[ 155.016892] 1ee0: 80109ae4 00000000 40000093 40000093 80d04d70 80d00000
[ 155.028251] 1f00: 80d04db8 00000001 80d8efbe babff9c0 80c64a38 80d01f44 80d0517c 80d01f38
[ 155.039585] 1f20: 00000000 80109ae8 40000013 ffffffff
[ 155.046248] [<801019bc>] (__irq_svc) from [<80109ae8>] (arch_cpu_idle+0x34/0x4c)
[ 155.056851] [<80109ae8>] (arch_cpu_idle) from [<808624d4>] (default_idle_call+0x34/0x48)
[ 155.068246] [<808624d4>] (default_idle_call) from [<80152e80>] (do_idle+0xec/0x16c)
[ 155.079320] [<80152e80>] (do_idle) from [<801531c0>] (cpu_startup_entry+0x28/0x2c)
[ 155.090312] [<801531c0>] (cpu_startup_entry) from [<8085bb80>] (rest_init+0xbc/0xc0)
[ 155.101555] [<8085bb80>] (rest_init) from [<80c00fb0>] (start_kernel+0x484/0x4b4)
[ 155.112536] CPU2: stopping
[ 155.116916] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G C 4.19.93-v7+ #1290
[ 155.128272] Hardware name: BCM2835
[ 155.133304] [<801120c0>] (unwind_backtrace) from [<8010d5f4>] (show_stack+0x20/0x24)
[ 155.144358] [<8010d5f4>] (show_stack) from [<808453e8>] (dump_stack+0xe0/0x124)
[ 155.153419] [<808453e8>] (dump_stack) from [<801104d8>] (handle_IPI+0x388/0x3a8)
[ 155.164104] [<801104d8>] (handle_IPI) from [<801021f4>] (bcm2836_arm_irqchip_handle_irq+0xa0/0xa4)
[ 155.176409] [<801021f4>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0x7c)
[ 155.188622] Exception stack(0xb9d4ff38 to 0xb9d4ff80)
[ 155.195326] ff20: 80109ae4 00000000
[ 155.206661] ff40: 40000093 40000093 80d04d70 b9d4e000 80d04db8 00000004 80d8efbe 410fd034
[ 155.217929] ff60: 00000000 b9d4ff94 80d0517c b9d4ff88 00000000 80109ae8 40000013 ffffffff
[ 155.229199] [<801019bc>] (__irq_svc) from [<80109ae8>] (arch_cpu_idle+0x34/0x4c)
[ 155.239801] [<80109ae8>] (arch_cpu_idle) from [<808624d4>] (default_idle_call+0x34/0x48)
[ 155.251219] [<808624d4>] (default_idle_call) from [<80152e80>] (do_idle+0xec/0x16c)
[ 155.262262] [<80152e80>] (do_idle) from [<801531c0>] (cpu_startup_entry+0x28/0x2c)
[ 155.273255] [<801531c0>] (cpu_startup_entry) from [<8010fedc>] (secondary_start_kernel+0x134/0x140)
[ 155.285798] [<8010fedc>] (secondary_start_kernel) from [<0010270c>] (0x10270c)
[ 155.294841] CPU3: stopping
[ 155.299242] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G C 4.19.93-v7+ #1290
[ 155.310644] Hardware name: BCM2835
[ 155.315717] [<801120c0>] (unwind_backtrace) from [<8010d5f4>] (show_stack+0x20/0x24)
[ 155.326922] [<8010d5f4>] (show_stack) from [<808453e8>] (dump_stack+0xe0/0x124)
[ 155.336059] [<808453e8>] (dump_stack) from [<801104d8>] (handle_IPI+0x388/0x3a8)
[ 155.346837] [<801104d8>] (handle_IPI) from [<801021f4>] (bcm2836_arm_irqchip_handle_irq+0xa0/0xa4)
[ 155.359224] [<801021f4>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0x7c)
[ 155.371521] Exception stack(0xb9d51f38 to 0xb9d51f80)
[ 155.378268] 1f20: 80109ae4 00000000
[ 155.389690] 1f40: 40000093 40000093 80d04d70 b9d50000 80d04db8 00000008 80d8efbe 410fd034
[ 155.401041] 1f60: 00000000 b9d51f94 80d0517c b9d51f88 00000000 80109ae8 40000013 ffffffff
[ 155.412396] [<801019bc>] (__irq_svc) from [<80109ae8>] (arch_cpu_idle+0x34/0x4c)
[ 155.422998] [<80109ae8>] (arch_cpu_idle) from [<808624d4>] (default_idle_call+0x34/0x48)
[ 155.434412] [<808624d4>] (default_idle_call) from [<80152e80>] (do_idle+0xec/0x16c)
[ 155.445454] [<80152e80>] (do_idle) from [<801531c0>] (cpu_startup_entry+0x28/0x2c)
[ 155.456445] [<801531c0>] (cpu_startup_entry) from [<8010fedc>] (secondary_start_kernel+0x134/0x140)
[ 155.468987] [<8010fedc>] (secondary_start_kernel) from [<0010270c>] (0x10270c)
[ 155.478038] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: i2cdev_ioctl_smbus+0x268/0x268 [i2c_dev] ]---
I've got a reasonable I2C knowledge, but I'm a total noob w.r.t. SMBus. However, from reading the Python and some kernel code, it looks like read_block_data is expecting the device to send a sequence of bytes in which the first byte indicates the number of following bytes. It also appears that the maximum block length supported by the kernel is about 32 bytes:
/*
* Data for SMBus Messages
*/
#define I2C_SMBUS_BLOCK_MAX 32 /* As specified in SMBus standard */
union i2c_smbus_data {
__u8 byte;
__u16 word;
__u8 block[I2C_SMBUS_BLOCK_MAX + 2]; /* block[0] is used for length */
/* and one more for user-space compatibility */
};
Looking at the OPT3001 datasheet I don't see any mention of that kind of transfer, just 16-bit reads and writes where the MSB is sent first. Is it possible that the device is performing a simple register read of 0x7e, which would return 0x30 0x01, and the first returned byte (0x30 = 48) is being interpreted as a length? This could easily cause it to overflow the stack frame.
Try read_word_data
instead.
I think your reading of the spec is correct. Random post via Google: https://www.microchip.com/forums/FindPost/1033077
In smbus there is a thing called block read. It can only be used on registers that support it. For example the battery name and manufacturer name on a standard smbus battery.
Spec http://www.smbus.org/specs/SMBus_3_1_20180319.pdf, section 6.5.7 (page 42) also says the device has to return the length count
I'd agree that the kernel shouldn't allow the transfer to smash the stack. I suspect it is the loop copying the data back out at https://elixir.bootlin.com/linux/v5.4.15/source/drivers/i2c/i2c-core-smbus.c#L498 that is at fault as it is relying totally on the value reported back by the SMBus device. I think there should be a guard along the lines of
for (i = 0; i < msg[1].buf[0] + 1 && i < I2C_SMBUS_BLOCK_MAX + 1; i++)
would prevent it. Or base it on the test for write at line 426 and return an error and no data instead of truncating.
if (msg[1].buf[0] > I2C_SMBUS_BLOCK_MAX) {
dev_err(&adapter->dev, "Invalid block %s size %d returned\n",
read_write == I2C_SMBUS_READ ? "read" : "write",
msg[1].buf[0]);
return -EINVAL;
}
I guess this ought to be reported upstream as things shouldn't crash the kernel.
I'm new to read data from I2c-Bus. However a kernel panic shouldn't happen from a user space script.
Describe the bug Trying to read data from SMBus leads to kernel panic. To reproduce pip3 install smbus2 Then execute
Expected behaviour Any error but no kernel panic
Actual behaviour Whole PI crashes with Kernel stack is corruputed in i2cdev_ioctl_smbus+0x268 [i2c_dev]
System