Closed jeroenmaelbrancke closed 8 years ago
From Arne:
If Chris isn't looking into it already: "invalid input stream" indicates a timeout within the alba proxy client library (the proxy did not respond in time). Furthermore, the error pasted to the ticket is on SCO write afaics which happens in the background. So that causing an error on 'ls' is pretty unlikely. The exact error from 'ls' would also be handy.
I looked at it a bit, and there were no writes were even near the timeout limit. Here's a grep of 'took' on the proxy's log file
2016-07-26 14:46:08 751949 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1399 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e59_00",_,_,_) took 1.070420
2016-07-26 14:46:08 771862 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1400 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e5e_00",_,_,_) took 0.762260
2016-07-26 14:46:08 811615 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1401 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e5f_00",_,_,_) took 0.758592
2016-07-26 14:46:17 27523 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1416 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e62_00",_,_,_) took 0.610416
2016-07-26 14:46:17 350898 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1417 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e63_00",_,_,_) took 0.920754
2016-07-26 14:46:17 476658 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1418 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e65_00",_,_,_) took 0.969833
2016-07-26 14:46:17 478684 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1419 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e64_00",_,_,_) took 0.974798
2016-07-26 14:46:17 556561 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1420 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e67_00",_,_,_) took 1.001063
2016-07-26 14:46:17 564769 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1421 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e66_00",_,_,_) took 1.024665
2016-07-26 14:46:17 680706 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1422 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e69_00",_,_,_) took 1.039209
2016-07-26 14:46:17 699628 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1423 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e68_00",_,_,_) took 1.097272
2016-07-26 14:46:17 860559 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1424 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6a_00",_,_,_) took 1.206830
2016-07-26 14:46:17 866063 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1425 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6b_00",_,_,_) took 1.180408
2016-07-26 14:46:17 880296 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1426 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6d_00",_,_,_) took 1.126711
2016-07-26 14:46:17 898704 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1427 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6c_00",_,_,_) took 1.157396
2016-07-26 14:46:17 947299 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1428 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6e_00",_,_,_) took 1.173689
2016-07-26 14:46:17 960452 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1429 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6f_00",_,_,_) took 1.161456
@pploegaert please investiagte
run fio and do live
After discussion with @pploegaert assigning to @JeffreyDevloo as he is investigating issues with the migrate.
The environment on which this occurred has been lost. I have checked into migration issues in the past but this was one that failed to catch my attention.
My proposal is reopening this issue when it occurs again so I can properly focus solely on this ticket.
Wim and i tried multiple failovers from a vm (fast after each other) without seeing error. After these tests we put some data on the vm and failover the vm again. When we ls the mountpoint at the same moment the volumedriver throws some errors. The vm was successfully failovered with these errors.
Can this be investigated why we have these errors?
vdisk: