phobos-storage / phobos

This repository holds the source code for Phobos, a Parallel Heterogeneous Object Store.
GNU Lesser General Public License v2.1
3 stars 2 forks source link

phobosd segfault when formatting tape if drive model not found in config #11

Open martinetd opened 1 month ago

martinetd commented 1 month ago

I think there's a problem with init not failing early when the model is not found?

Trying to format a tape with a bad config segfaults as follow:

(gdb) bt
#0  0x00007fb4a0cbc58b in __strcmp_avx2 () from /usr/lib64/libc.so.6
#1  0x00007fb4a1092f95 in msort_with_tmp () from /usr/lib64/libglib-2.0.so.0
#2  0x00007fb4a1092f23 in msort_with_tmp () from /usr/lib64/libglib-2.0.so.0
#3  0x00007fb4a10957cd in msort_r () from /usr/lib64/libglib-2.0.so.0
#4  0x000055cc541da244 in fair_share_number_of_requests (_devices=<optimized out>, 
    io_sched_hdl=0x55cc54d90920) at io_schedulers/device_dispatch_algorithms.c:775
#5  fair_share_number_of_requests (io_sched_hdl=0x55cc54d90920, _devices=<optimized out>)
    at io_schedulers/device_dispatch_algorithms.c:747
#6  0x000055cc541d81b1 in io_sched_dispatch_devices (devices=<optimized out>, 
    io_sched_hdl=0x55cc54d90920) at /usr/src/debug/phobos-1.95.1-2.el9.x86_64/src/lrs/io_sched.c:109
#7  lrs_sched_thread (sdata=<optimized out>)
    at /usr/src/debug/phobos-1.95.1-2.el9.x86_64/src/lrs/lrs_sched.c:2823
#8  0x00007fb4a0c9f802 in start_thread () from /usr/lib64/libc.so.6
#9  0x00007fb4a0c3f450 in clone3 () from /usr/lib64/libc.so.6

The actual crash is because ld_technology is NULL in sort_devices_by_technology_cmp() I think we should fail any such drives early e.g. make lrs_dev_technology() fail, but didn't take the time to dig into this yet as it was more of a configuration problem.

courrierg commented 1 month ago

Hello, thank you for the bug report.

The issue is that this is fair share specific and the initialization code doesn't know if we are using this algorithm or not. So I made the initialization of ld_technology optional. If it fails, we shouldn't prevent the LRS from working since we may not even use this element without the fair share. But this should definitely be fixed. I'll have to think about a proper way of handling this. Maybe it's ok to fail even if we don't use the fair share.