Closed jkryl closed 3 years ago
I've been able to replicate this unfortunate behaviour. There is a mixture of things here, most notable that our default settings are still in development mode. That is to say, we poll the queue's at such a high rate, the error handling hardly makes any progress.
In any case, I've written a simple test case that exhibits the problem, the relevant snippet:
create_work(Arc::clone(&queue)).await; // start pumping IO
let mut ticker = tokio::time::interval(Duration::from_secs(1));
for i in 1 .. 20 {
ticker.tick().await;
if i == 5 {
kill_replica().await;
}
// after each tick, display bdev stats
ms.spawn(async move {
let bdev = Bdev::bdev_first().unwrap().into_iter();
for b in bdev {
let result = b.stats().await.unwrap();
println!("{}: {:?}", b.name(), result);
}
})
.await;
// ctrl was hit during the test, exit now
if SIG_RECEIVED.load(Ordering::Relaxed) {
break;
}
}
Several crashes were observed during these tests, but with some initial improvements to the settings, some reordering of the removal code I get the following output:
10.1.0.3:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 112, num_write_ops: 2, bytes_read: 57344, bytes_written: 34304 }
10.1.0.2:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 131, num_write_ops: 2, bytes_read: 67072, bytes_written: 34304 }
nexus0: BdevStats { num_read_ops: 262, num_write_ops: 0, bytes_read: 134144, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 112, num_write_ops: 0, bytes_read: 57344, bytes_written: 0 }
10.1.0.3:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 7285, num_write_ops: 2, bytes_read: 3729920, bytes_written: 34304 }
10.1.0.2:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 7294, num_write_ops: 2, bytes_read: 3734528, bytes_written: 34304 }
nexus0: BdevStats { num_read_ops: 14633, num_write_ops: 0, bytes_read: 7492096, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 7343, num_write_ops: 0, bytes_read: 3759616, bytes_written: 0 }
10.1.0.3:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 14485, num_write_ops: 2, bytes_read: 7416320, bytes_written: 34304 }
10.1.0.2:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 14502, num_write_ops: 2, bytes_read: 7425024, bytes_written: 34304 }
nexus0: BdevStats { num_read_ops: 29033, num_write_ops: 0, bytes_read: 14864896, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 14590, num_write_ops: 0, bytes_read: 7470080, bytes_written: 0 }
10.1.0.3:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 21674, num_write_ops: 2, bytes_read: 11097088, bytes_written: 34304 }
10.1.0.2:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 21674, num_write_ops: 2, bytes_read: 11097088, bytes_written: 34304 }
nexus0: BdevStats { num_read_ops: 43378, num_write_ops: 0, bytes_read: 22209536, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 21813, num_write_ops: 0, bytes_read: 11168256, bytes_written: 0 }
A little hard to read but the key thing here is that we have nexus0
to which we are pushing IO and two replicas starting with their respective containers IP. After 5 ticks, one container is killed and we observe:
020-11-13T16:46:40.469970079+01:00 ERROR mayastor::logger: CQ error, abort requests after transport retry counter exceeded
2020-11-13T16:46:40.470041705+01:00 ERROR mayastor::logger: ctrlr 10.1.0.2 in failed state.
2020-11-13T16:46:40.470093964+01:00 ERROR mayastor::logger: readv failed: rc = -6
2020-11-13T16:46:40.470189225+01:00 ERROR mayastor::logger: readv failed: rc = -6
2020-11-13T16:46:40.471346149+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x000020000397e0c0
2020-11-13T16:46:40.471392677+01:00 ERROR mayastor::logger: readv failed: rc = -6
<SNIP>
A few more lines are shown but once we start processing the completion callback we can see that the system is reconfiguring:
2020-11-13T16:46:40.475426377+01:00 INFO mayastor::bdev::nexus::nexus_bdev: nexus0: Dynamic reconfiguration event: ChildFault started
2020-11-13T16:46:40.479926742+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x000020000395ecc0
2020-11-13T16:46:40.479965104+01:00 INFO mayastor::bdev::nexus::nexus_channel: nexus0(tid:"none"), refreshing IO channels
2020-11-13T16:46:40.480043283+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x000020000397e0c0
2020-11-13T16:46:40.480092867+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x00002000039650c0
2020-11-13T16:46:40.481241124+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x0000200003990040
2020-11-13T16:46:40.481302110+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x0000200003978940
2020-11-13T16:46:40.481355280+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x000020000398a8c0
2020-11-13T16:46:40.481402029+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x00002000039669c0
2020-11-13T16:46:40.481448427+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x000020000398e740
2020-11-13T16:46:40.481494555+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x00002000039682c0
2020-11-13T16:46:40.481554148+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x0000200003981f40
2020-11-13T16:46:40.481612448+01:00 ERROR mayastor::core::io_driver: job_nexus0: 0x00002000039763c0
2020-11-13T16:46:40.482749925+01:00 INFO mayastor::bdev::nexus::nexus_bdev: nexus0: Dynamic reconfiguration event: ChildFault completed Ok(0)
2020-11-13T16:46:40.482809989+01:00 INFO mayastor::bdev::nexus::nexus_child: Closing child nvmf://10.1.0.2:8420/nqn.2019-05.io.openebs:disk0
2020-11-13T16:46:40.482908626+01:00 INFO bdev_destroy{uri="nvmf://10.1.0.2:8420/nqn.2019-05.io.openebs:disk0"}: mayastor::nexus_uri: new
2020-11-13T16:46:40.482967538+01:00 INFO bdev_destroy{uri="nvmf://10.1.0.2:8420/nqn.2019-05.io.openebs:disk0"}: mayastor::nexus_uri: enter
2020-11-13T16:46:40.483120658+01:00 INFO bdev_destroy{uri="nvmf://10.1.0.2:8420/nqn.2019-05.io.openebs:disk0"}: mayastor::nexus_uri: exit
2020-11-13T16:46:40.483166115+01:00 INFO bdev_destroy{uri="nvmf://10.1.0.2:8420/nqn.2019-05.io.openebs:disk0"}: mayastor::nexus_uri: close time.busy=1
Once this has all passed we can see we are pushing IO again, but the failed device has been removed:
nexus0: BdevStats { num_read_ops: 57934, num_write_ops: 0, bytes_read: 29662208, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 29180, num_write_ops: 0, bytes_read: 14940160, bytes_written: 0 }
10.1.0.3:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 35934, num_write_ops: 2, bytes_read: 18398208, bytes_written: 34304 }
nexus0: BdevStats { num_read_ops: 64870, num_write_ops: 0, bytes_read: 33213440, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 36171, num_write_ops: 0, bytes_read: 18519552, bytes_written: 0 }
10.1.0.3:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 43126, num_write_ops: 2, bytes_read: 22080512, bytes_written: 34304 }
nexus0: BdevStats { num_read_ops: 72070, num_write_ops: 0, bytes_read: 36899840, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 43402, num_write_ops: 0, bytes_read: 22221824, bytes_written: 0 }
10.1.0.3:8420/nqn.2019-05.io.openebs:disk0n1: BdevStats { num_read_ops: 50326, num_write_ops: 2, bytes_read: 25766912, bytes_written: 34304 }
nexus0: BdevStats { num_read_ops: 79262, num_write_ops: 0, bytes_read: 40582144, bytes_written: 0 }
disk0: BdevStats { num_read_ops: 50657, num_write_ops: 0, bytes_read: 25936384, bytes_written: 0 }
This test was using the API directly, that is, there is no target involved. We should verify that is works correctly with a target and initiator but usually, that's far more forgiving than then API directly (as it implicitly has reset and retry logic).
I've done some more testing around this particular issue and here is an excerpt of the things I've done (apart from the test added in the previous comment)
With two nvmf targets pre-configured:
./target/debug/mayastor-client nexus create $UUID 500MiB "nvmf://127.0.0.1:4420/replica0 nvmf://127.0.0.1:4420/replica1"
Nexus e46329bb-7e7a-480b-8b97-acc17fb31ba5 created
$ ./target/debug/mayastor-client bdev share nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5
{
"uri": "nvmf://127.0.0.1:8420/nqn.2019-05.io.openebs:nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5"
}
This, in effect, is the same sequence of calls that the CSI driver does. Once the target is shared, we can connect to and run some IO:
sudo nvme connect -t tcp -a 127.0.0.1 -s 8420 -n nqn.2019-05.io.openebs:nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5
The device shows up as nvme3, and we then using fio to drive some IO to it. Once that is running, I disable one of the targets, which in the logs yields:
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
WARN mayastor::bdev::nexus::nexus_io:nexus_io.rs:329] core 1 thread Some(Mthread(0x55db26432f40)), faulting child nvmf://127.0.0.1:4420/replica0: Faulted(IoError), blk_cnt: 262144, blk_size: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
INFO mayastor::bdev::nexus::nexus_bdev:nexus_bdev.rs:454] nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5: Dynamic reconfiguration event: ChildFault started
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
ERROR mayastor::bdev::nexus::nexus_io:nexus_io.rs:319] name: 127.0.0.1:4420/replica0n1, driver: nvme, product: NVMe disk, num_blocks: 262144, block_len: 4096
INFO mayastor::bdev::nexus::nexus_channel:nexus_channel.rs:102] nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5(thread:"mayastor_nvmf_tcp_pg_core_1"), refreshing IO channels
INFO mayastor::bdev::nexus::nexus_channel:nexus_channel.rs:102] nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5(thread:"mayastor_nvmf_tcp_pg_core_2"), refreshing IO channels
INFO mayastor::bdev::nexus::nexus_channel:nexus_channel.rs:102] nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5(thread:"mayastor_nvmf_tcp_pg_core_34"), refreshing IO channels
INFO mayastor::bdev::nexus::nexus_channel:nexus_channel.rs:248] nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5: Reconfigure completed
INFO mayastor::bdev::nexus::nexus_bdev:nexus_bdev.rs:468] nexus-e46329bb-7e7a-480b-8b97-acc17fb31ba5: Dynamic reconfiguration event: ChildFault completed 0
INFO mayastor::nexus_uri:bdev_destroy{uri="nvmf://127.0.0.1:4420/replica0"}:nexus_uri.rs:79] new
INFO mayastor::nexus_uri:bdev_destroy{uri="nvmf://127.0.0.1:4420/replica0"}:nexus_uri.rs:79] enter
INFO mayastor::nexus_uri:bdev_destroy{uri="nvmf://127.0.0.1:4420/replica0"}:nexus_uri.rs:79] exit
INFO mayastor::nexus_uri:bdev_destroy{uri="nvmf://127.0.0.1:4420/replica0"}:nexus_uri.rs:79] close time.busy=143µs time.idle=99.1µs
INFO mayastor::core::bdev:bdev.rs:168] Received remove event for bdev 127.0.0.1:4420/replica0n1
INFO mayastor::bdev::nexus::nexus_child:nexus_child.rs:370] Removing child nvmf://127.0.0.1:4420/replica0
INFO mayastor::bdev::nexus::nexus_child:nexus_child.rs:406] Child nvmf://127.0.0.1:4420/replica0 removed
While this was happening in a different terminal, I've observed the NVMe queue's as the kernel sees them using the following bt:
tracepoint:nvme:nvme_complete_rq / args->ctrl_id == $1 && args->status != 0 / {
if (args->qid != 0){
printf("<-- device: %s, retries: %d, status: %x, cid=%d, qid=%d\n",
args->disk, args->retries, args->status, args->cid, args->qid);
} else {
printf("<-- controller: %d, retries: %d, status: %x, cid=%d, qid=%d\n",
args->ctrl_id, args->retries, args->status, args->cid, args->qid);
}
}
tracepoint:nvme:nvme_complete_rq / args->ctrl_id == $1 && args->retries != 0 / {
if (args->qid != 0){
printf("<-- retried: %s, retries: %d, status: %x, cid=%d, qid=%d\n",
args->disk, args->retries, args->status, args->cid, args->qid);
} else {
printf("<-- controller: %d, retries: %d, status: %x, cid=%d, qid=%d\n",
args->ctrl_id, args->retries, args->status, args->cid, args->qid);
}
}
This resulted in the following output:
sudo bpftrace nvme_complete_rq.bt 3
Attaching 3 probes...
<-- device: nvme3n1, retries: 0, status: 6, cid=14, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=54, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=7, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=11, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=6, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=10, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=12, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=13, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=49, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=51, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=52, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=16, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=25, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=33, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=17, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=34, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=36, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=27, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=28, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=30, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=31, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=32, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=55, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=18, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=60, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=35, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=20, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=21, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=57, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=58, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=59, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=61, qid=37
<-- device: nvme3n1, retries: 0, status: 6, cid=62, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=16, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=52, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=51, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=49, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=13, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=12, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=10, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=6, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=32, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=31, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=30, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=28, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=27, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=36, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=34, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=7, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=14, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=54, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=11, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=25, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=33, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=17, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=62, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=61, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=59, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=58, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=57, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=21, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=20, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=35, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=60, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=55, qid=37
<-- retried: nvme3n1, retries: 1, status: 0, cid=18, qid=37
The kernel retries all the IO that failed. This also indicates that if one of the replicas goes away, the error it produces is sent back to the initiator to retry the IO. We might want to make the retries implicit -- i.e have the be retried internally but for now, I think it is fair to close this issues as it particularly suggests that we should not crash which we AFAIK, don't do (and actually never did). This is demonstrated by the test-cases and the above example.
Feel free to reopen if it occurs again.
It is a combination of control & data plane issues that results in a sad moment of volume being unavailable although that just one replica out of N is missing. Test environment is a 3 node cluster with 3 replicas of the same volume. We deliberately terminate one of the nodes that runs the application and one of the replicas (not a nexus - that's a different issue #508). So what happens then:
Following message appears cca 20x in mayastor log file (instance with the nexus):
then
repeated many times. And then a panic at the end:
What does control plane in the meantime?
k8s tries to unpublish the volume so that it can move the application to a different node. But mayastor node fails to reply and moac waits forever (that is what #506 is about). And because this fails the application cannot be scheduled and is stuck. Nothing else can use the volume. Control plane side of the things will be fixed on behalf of #506. This ticket exists to track the problem in data plane (mayastor) and its inadequate reaction to the situation when one of the replicas becomes unreachable (should not crash).