openvstorage / volumedriver

The Open vStorage VolumeDriver is the core of the Open vStorage solution: a high performance distributed block layer. It converts block storage into objects (Storage Container Objects).
Other
37 stars 18 forks source link

write object: caught exception: invalid inputstream when failover vm to other datacenter #75

Closed jeroenmaelbrancke closed 8 years ago

jeroenmaelbrancke commented 8 years ago

Wim and i tried multiple failovers from a vm (fast after each other) without seeing error. After these tests we put some data on the vm and failover the vm again. When we ls the mountpoint at the same moment the volumedriver throws some errors. The vm was successfully failovered with these errors.

Can this be investigated why we have these errors?

2016-07-26 15:14:56 555113 +0200 - perf-roub-01 - 56816/0x00007f87a0b2d700 - volumedriverfs/DataStoreNG - 0000000000752331 - info - updateCurrentSCO_: created new write SCO 00_00003228_00 for Volume 45eed2e0-8912-41ab-90a9-ac7d02334edc
2016-07-26 15:14:56 555324 +0200 - perf-roub-01 - 56816/0x00007f89bec15700 - volumedriverfs/AlbaProxyClient - 0000000000752332 - info - logMessage: TCPProxy_client(188.165.13.24, 26204)
2016-07-26 15:14:56 555607 +0200 - perf-roub-01 - 56816/0x00007f89bec15700 - volumedriverfs/BackendConnectionInterfaceLogger - 0000000000752333 - info - Logger: Entering write 45eed2e0-8912-41ab-90a9-ac7d02334edc 00_0000321a_00
2016-07-26 15:14:56 590056 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendConnectionInterfaceLogger - 0000000000752334 - info - ~Logger: Exiting write for 45eed2e0-8912-41ab-90a9-ac7d02334edc 00_00003216_00
2016-07-26 15:14:56 590148 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendConnectionInterfaceLogger - 0000000000752335 - info - ~Logger: Exiting write for 45eed2e0-8912-41ab-90a9-ac7d02334edc
2016-07-26 15:14:56 590304 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendConnectionInterfaceLogger - 0000000000752336 - info - Logger: Entering write 45eed2e0-8912-41ab-90a9-ac7d02334edc 00_0000321b_00
2016-07-26 15:14:56 590462 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/AlbaConnection - 0000000000752337 - error - convert_exceptions_: write object: caught exception: invalid inputstream
2016-07-26 15:14:56 590531 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendConnectionInterfaceLogger - 0000000000752338 - error - ~Logger: Exiting write for 45eed2e0-8912-41ab-90a9-ac7d02334edc 00_0000321b_00 with 
exception
2016-07-26 15:14:56 590557 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendConnectionInterfaceLogger - 0000000000752339 - error - ~Logger: Exiting write for 45eed2e0-8912-41ab-90a9-ac7d02334edc
2016-07-26 15:14:56 590582 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendInterface - 000000000075233a - error - do_wrap_: Problem with connection 0x7f86f8003f10: invalid inputstream
2016-07-26 15:14:56 590682 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/AlbaProxyClient - 000000000075233b - info - logMessage: TCPProxy_client(188.165.13.24, 26204)
2016-07-26 15:14:56 590924 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendInterface - 000000000075233c - warning - do_wrap_: Retrying with new connection (retry: 1, sleep before retry: 0 milliseconds)
2016-07-26 15:14:56 590997 +0200 - perf-roub-01 - 56816/0x00007f89bb40e700 - volumedriverfs/BackendConnectionInterfaceLogger - 000000000075233d - info - Logger: Entering write 45eed2e0-8912-41ab-90a9-ac7d02334edc 00_0000321b_00
2016-07-26 15:14:56 603477 +0200 - perf-roub-01 - 56816/0x00007f87d3fff700 - volumedriverfs/DataStoreNG - 000000000075233e - info - updateCurrentSCO_: created new write SCO 00_00003229_00 for Volume 45eed2e0-8912-41ab-90a9-ac7d02334edc
2016-07-26 15:14:56 603664 +0200 - perf-roub-01 - 56816/0x00007f89bd412700 - volumedriverfs/AlbaProxyClient - 000000000075233f - info - logMessage: TCPProxy_client(188.165.13.24, 26204)
2016-07-26 15:14:56 603890 +0200 - perf-roub-01 - 56816/0x00007f89bd412700 - volumedriverfs/BackendConnectionInterfaceLogger - 0000000000752340 - info - Logger: Entering write 45eed2e0-8912-41ab-90a9-ac7d02334edc 00_0000321c_00
2016-07-26 15:14:56 644571 +0200 - perf-roub-01 - 56816/0x00007f8763fff700 - volumedriverfs/DataStoreNG - 0000000000752341 - info - updateCurrentSCO_: created new write SCO 00_0000322a_00 for Volume 45eed2e0-8912-41ab-90a9-ac7d02334edc
2016-07-26 15:14:56 644768 +0200 - perf-roub-01 - 56816/0x00007f89bac0d700 - volumedriverfs/AlbaProxyClient - 0000000000752342 - info - logMessage: TCPProxy_client(188.165.13.24, 26204)

vdisk:

In [2]: disk = VDiskList.get_vdisk_by_volume_id('45eed2e0-8912-41ab-90a9-ac7d02334edc')

In [3]: disk.name
Out[3]: u'KVM10-fio-rou-d1'

In [4]: disk.vpool.name
Out[4]: u'geo-accel-alba'
wimpers commented 8 years ago

From Arne:

If Chris isn't looking into it already: "invalid input stream" indicates a timeout within the alba proxy client library (the proxy did not respond in time). Furthermore, the error pasted to the ticket is on SCO write afaics which happens in the background. So that causing an error on 'ls' is pretty unlikely. The exact error from 'ls' would also be handy.

toolslive commented 8 years ago

I looked at it a bit, and there were no writes were even near the timeout limit. Here's a grep of 'took' on the proxy's log file

2016-07-26 14:46:08 751949 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1399 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e59_00",_,_,_) took 1.070420
2016-07-26 14:46:08 771862 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1400 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e5e_00",_,_,_) took 0.762260
2016-07-26 14:46:08 811615 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1401 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e5f_00",_,_,_) took 0.758592
2016-07-26 14:46:17 27523 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1416 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e62_00",_,_,_) took 0.610416
2016-07-26 14:46:17 350898 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1417 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e63_00",_,_,_) took 0.920754
2016-07-26 14:46:17 476658 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1418 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e65_00",_,_,_) took 0.969833
2016-07-26 14:46:17 478684 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1419 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e64_00",_,_,_) took 0.974798
2016-07-26 14:46:17 556561 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1420 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e67_00",_,_,_) took 1.001063
2016-07-26 14:46:17 564769 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1421 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e66_00",_,_,_) took 1.024665
2016-07-26 14:46:17 680706 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1422 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e69_00",_,_,_) took 1.039209
2016-07-26 14:46:17 699628 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1423 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e68_00",_,_,_) took 1.097272
2016-07-26 14:46:17 860559 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1424 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6a_00",_,_,_) took 1.206830
2016-07-26 14:46:17 866063 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1425 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6b_00",_,_,_) took 1.180408
2016-07-26 14:46:17 880296 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1426 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6d_00",_,_,_) took 1.126711
2016-07-26 14:46:17 898704 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1427 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6c_00",_,_,_) took 1.157396
2016-07-26 14:46:17 947299 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1428 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6e_00",_,_,_) took 1.173689
2016-07-26 14:46:17 960452 +0200 - perf-roub-01 - 79091/0 - alba/proxy - 1429 - info - Request WriteObjectFs ("45eed2e0-8912-41ab-90a9-ac7d02334edc","00_00000e6f_00",_,_,_) took 1.161456
wimpers commented 8 years ago

@pploegaert please investiagte

run fio and do live

wimpers commented 8 years ago

After discussion with @pploegaert assigning to @JeffreyDevloo as he is investigating issues with the migrate.

JeffreyDevloo commented 8 years ago

The environment on which this occurred has been lost. I have checked into migration issues in the past but this was one that failed to catch my attention.

My proposal is reopening this issue when it occurs again so I can properly focus solely on this ticket.