Open sysrqio opened 6 years ago
Hi, thanks for bringing this up.
It's quite interesting, as the segfault originates from Protobuf generated code. Seeing you're using the Vivaldi feature, have you tried the same scenario without Vivaldi? Have you followed the user guide in setting up Vivaldi?
Cheers Robert
Hi Robert, I think I am using the Vivaldi Feature as designed. The share has vivaldi set as replica selection property and the mount.xtreemfs is used with the parameters --vivaldi-enable. I tried combinations with --vivaldi-enable-dir-updates. IMO the vivaldi feature fits best for my setup, which uses multiple OSDs in different locations, interconnected by a WAN link. One side is a Datacenter and the other one is a fiber link.
Using it without vivaldi in the client and the share (By setting the share to default), i do not see it core dumping. After some time I see the rsync to be stalled and when sending a SIGABRT some file transfers have the error:
rsync: mkstemp "/path/filename" failed: Transport endpoint is not connected (107)
This is visible in both with or without vivaldi.
My config is mostly the default config. The main difference is that I am using grid_ssl with IPv6 only. I have seen a silimar error on the stable version, which is one of the reasons why I tried the unstable version.
Hi,
I recall we've had some issues with the transport endpoint not connected error in the past. It was usually due to some exception during read/write, which caused the fuse client to hang and eventually die. We tried mitigating this via rigorous exception handling, but it seems there are still problems. Have you enabled the debug option in the mounting client? You can also increase the logging level at the MRC and OSDs and attach the logs to this ticket, along with the time when the issue appeared, as debug logs can grow quite large pretty quickly. Thanks for helping us debug the issue!
Cheers Robert
Hi,
I found these errors around the timestamp when not using vivaldi:
[ D | MRCRequestDispatcher | PBRPCSrv@32636 | 39 | Feb 22 00:59:41 ] enqueueing request: org.xtreemfs.foundation.pbrpc.server.RPCServerRequest: callid=1955778, type=RPC_REQUEST,proc=15,interf=20001
[ D | ProcessingStage | ProcSt | 43 | Feb 22 00:59:41 ] org.xtreemfs.mrc.UserException: No open voucher for global fileId c7ee8a65-ae8c-424a-bbcf-0512cb513e18:568879 (errno=POSIX_ERROR_EINVAL)
... org.xtreemfs.mrc.quota.VoucherManager.addRenewedTimestamp(VoucherManager.java:512)
... org.xtreemfs.mrc.operations.RenewCapabilityAndVoucherOperation.startRequest(RenewCapabilityAndVoucherOperation.java:87)
... org.xtreemfs.mrc.stages.ProcessingStage.execute(ProcessingStage.java:310)
... org.xtreemfs.mrc.stages.ProcessingStage.parseAndExecute(ProcessingStage.java:273)
... org.xtreemfs.mrc.stages.ProcessingStage.processMethod(ProcessingStage.java:176)
... org.xtreemfs.mrc.stages.MRCStage.run(MRCStage.java:129)
[ D | MRCRequestDispatcher | ProcSt | 43 | Feb 22 00:59:41 ] org.xtreemfs.mrc.UserException: No open voucher for global fileId c7ee8a65-ae8c-424a-bbcf-0512cb513e18:568879 (errno=POSIX_ERROR_EINVAL)
... org.xtreemfs.mrc.quota.VoucherManager.addRenewedTimestamp(VoucherManager.java:512)
... org.xtreemfs.mrc.operations.RenewCapabilityAndVoucherOperation.startRequest(RenewCapabilityAndVoucherOperation.java:87)
... org.xtreemfs.mrc.stages.ProcessingStage.execute(ProcessingStage.java:310)
... org.xtreemfs.mrc.stages.ProcessingStage.parseAndExecute(ProcessingStage.java:273)
... org.xtreemfs.mrc.stages.ProcessingStage.processMethod(ProcessingStage.java:176)
... org.xtreemfs.mrc.stages.MRCStage.run(MRCStage.java:129)
[ D | RPCNIOSocketServer | ProcSt | 43 | Feb 22 00:59:41 ] response sent
and
[ D | RPCNIOSocketServer | PBRPCSrv@32636 | 39 | Feb 22 00:59:41 ] sent response for org.xtreemfs.foundation.pbrpc.server.RPCServerResponse: callid=1955778
[ D | RPCNIOSocketServer | PBRPCSrv@32636 | 39 | Feb 22 00:59:41 ] java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(java.base@9-internal/Native Method)
at sun.nio.ch.SocketDispatcher.read(java.base@9-internal/SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(java.base@9-internal/IOUtil.java:223)
at sun.nio.ch.IOUtil.read(java.base@9-internal/IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(java.base@9-internal/SocketChannelImpl.java:383)
at org.xtreemfs.foundation.pbrpc.channels.SSLHandshakeOnlyChannelIO.read(SSLHandshakeOnlyChannelIO.java:155)
at org.xtreemfs.foundation.pbrpc.server.RPCNIOSocketServer.readData(RPCNIOSocketServer.java:655)
at org.xtreemfs.foundation.pbrpc.server.RPCNIOSocketServer.readConnection(RPCNIOSocketServer.java:387)
at org.xtreemfs.foundation.pbrpc.server.RPCNIOSocketServer.run(RPCNIOSocketServer.java:306)
[ D | RPCNIOSocketServer | PBRPCSrv@32636 | 39 | Feb 22 00:59:41 ] closing connection to /xxxx:xxx:xxxx:xxxx:0:0:fe7d:xxxx:56314
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] sending request for OSD list to DIR.
After these errors the mrc log contains only:
[ D | RPCNIOSocketServer | PBRPCSrv@32636 | 39 | Feb 22 00:59:41 ] closing connection to /xxxx:xxxx:xxxx:xxxx:0:0:fe7d:xxxx:56314
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] sending request for OSD list to DIR...
[ D | RPCNIOSocketClient | OSDStatusMan... | 41 | Feb 22 00:59:46 ] sending request org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@7a0e1188 no 790498
[ D | RPCClientRequest | MRCRequestDi... | 38 | Feb 22 00:59:46 ] sending record marker: 40/2/0
[ D | RPCClientRequest | MRCRequestDi... | 38 | Feb 22 00:59:46 ] send buffer #1: ReusableBuffer( capacity=8192 limit=42 position=0)
[ D | RPCNIOSocketClient | MRCRequestDi... | 38 | Feb 22 00:59:46 ] sent request 790498 to xxxxx/xxxx:xxxx:xxxx:xxxx:0:0:0:xxxx:32638
[ D | RPCResponse | MRCRequestDi... | 38 | Feb 22 00:59:46 ] response received
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] ... received OSD list from DIR
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] response...
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] registered OSDs
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] a88d1832-484e-48f2-81c6-430fc71401d4
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] dbc52792-e3d7-416f-a21b-1740f44a6e9c
[ D | OSDStatusManager | OSDStatusMan... | 41 | Feb 22 00:59:46 ] e114ae07-8249-47d9-aec7-745e302e08ac
[ D | RPCNIOSocketClient | MRC Heartbea... | 48 | Feb 22 00:59:52 ] sending request org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@7a87e34d no 790499
[ D | RPCClientRequest | MRCRequestDi... | 38 | Feb 22 00:59:52 ] sending record marker: 55/38/0
[ D | RPCClientRequest | MRCRequestDi... | 38 | Feb 22 00:59:52 ] send buffer #1: ReusableBuffer( capacity=8192 limit=93 position=0)
[ D | RPCNIOSocketClient | MRCRequestDi... | 38 | Feb 22 00:59:52 ] sent request 790499 to xxxxxxx/xxxx:xxxx:xxxx:5a57:0:0:xxxx:xxxx:32638
[ D | RPCResponse | MRCRequestDi... | 38 | Feb 22 00:59:52 ] response received
I need to cleanup the logs from sensitive data and enable debug on the client.
The data copy stopped at 00:59:56
2018/02/22 00:59:56 c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/14/1515934930842_1515934932842_1515489024983_445935180.mp4 1453020
c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/14/1515934932797_1515934934797_1515489024983_445937180.mp4
Hi,
thanks for the extra information. It seems the problem is indeed in the client, judging from the logs you posted. Which SSL versions are you using? Just as a test, have you tried using the full SSL mode? I'm hoping to get some more insight from the client debug logs. Thanks for bearing with us and helping isolate the issue!
Cheers Robert
Hi Robert,
thank you for your patience. It takes some time to reproduce this issue. Attached you find the log of the client.
[ D | 2/23 02:18:10.661 | 7f73b17fa700 ] xtreemfs_fuse_write /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 s: 129408 o:1310720
[ D | 2/23 02:18:10.665 | 7f73b17fa700 ] xtreemfs_fuse_write finished /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 s:129408 o:1310720 w:129408
[ D | 2/23 02:18:10.666 | 7f73b1ffb700 ] xtreemfs_fuse_write /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 s: 46917 o:1440128
[ D | 2/23 02:18:10.671 | 7f73b1ffb700 ] xtreemfs_fuse_write finished /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 s:46917 o:1440128 w:46917
[ D | 2/23 02:18:10.671 | 7f73b27fc700 ] xtreemfs_fuse_flush /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ D | 2/23 02:18:10.671 | 7f73b27fc700 ] WriteBackFileSize: file_id: 606791 # bytes: 1487045 close file? 0
[ D | 2/23 02:18:10.673 | 7f73b27fc700 ] xtreemfs_fuse_lock on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 command: set lock type: unlock start: 0 length: 0 pid: 0
[ D | 2/23 02:18:10.673 | 7f73b27fc700 ] FileHandleImplementation::ReleaseLock: Skipping unlock request as there is no lock known for the PID: 0 (Lock description: 0, 0, 0)
[ D | 2/23 02:18:10.673 | 7f73b3fff700 ] xtreemfs_fuse_release /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ D | 2/23 02:18:10.673 | 7f73b3fff700 ] VoucherManager will finalize and clear all voucher information for file: 606791
[ D | 2/23 02:18:10.673 | 7f73b3fff700 ] Sending finalizeVouchersRequest to 1 OSD(s) containing 0 + 1 XCap(s)
[ D | 2/23 02:18:10.673 | 7f73b2ffd700 ] getattr on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ D | 2/23 02:18:10.673 | 7f73b2ffd700 ] getattr: serving from stat-cache /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 0
[ D | 2/23 02:18:10.674 | 7f73b2ffd700 ] getattr: merged infos from osd_write_response, size: 1487045
[ D | 2/23 02:18:10.674 | 7f738ffff700 ] xtreemfs_fuse_getxattr /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 security.capability 0
[ D | 2/23 02:18:10.674 | 7f73b37fe700 ] getattr on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ D | 2/23 02:18:10.674 | 7f73b37fe700 ] getattr: serving from stat-cache /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 0
[ D | 2/23 02:18:10.674 | 7f73b37fe700 ] getattr: merged infos from osd_write_response, size: 1487045
[ D | 2/23 02:18:10.674 | 7f73c251f700 ] Got all expected responses!
[ D | 2/23 02:18:10.674 | 7f73b3fff700 ] Skipped checkResponseConsistency: only one OSD involved.
[ D | 2/23 02:18:10.674 | 7f73b3fff700 ] Got consistent responses on try 1.
[ D | 2/23 02:18:10.675 | 7f73b3fff700 ] Sending clearVoucherRequest to the MRC.
[ D | 2/23 02:18:10.675 | 7f73b17fa700 ] xtreemfs_fuse_chown on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ D | 2/23 02:18:10.677 | 7f73c151d700 ] START open_file_table: Periodic filesize update for 1 open files.
[ D | 2/23 02:18:10.677 | 7f73c151d700 ] END open_file_table: Periodic filesize update for 1 open files.
[ D | 2/23 02:18:10.677 | 7f73c1d1e700 ] START open_file_table: Periodic XCap renewal for 1 open files.
[ D | 2/23 02:18:10.677 | 7f73c1d1e700 ] Renew XCap for file_id: 606791 Expiration in: 600
[ D | 2/23 02:18:10.677 | 7f73c1d1e700 ] END open_file_table: Periodic XCap renewal for 1 open files.
[ D | 2/23 02:18:10.679 | 7f73b17fa700 ] MetadataCache UpdateStatAttributes: /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 to_set: 70
[ D | 2/23 02:18:10.679 | 7f73b17fa700 ] getattr on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ D | 2/23 02:18:10.679 | 7f73b17fa700 ] getattr: serving from stat-cache /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 0
[ D | 2/23 02:18:10.679 | 7f73b17fa700 ] getattr: merged infos from osd_write_response, size: 1487045
[ D | 2/23 02:18:10.680 | 7f73b1ffb700 ] xtreemfs_fuse_utimens on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ E | 2/23 02:18:10.682 | 7f73c251f700 ] Renewing XCap of file: 606791 failed. Error: error_type: ERRNO
posix_errno: POSIX_ERROR_EINVAL
error_message: "No open voucher for global fileId c7ee8a65-ae8c-424a-bbcf-0512cb513e18:606791 (errno=POSIX_ERROR_EINVAL)"
debug_info: ""
[ D | 2/23 02:18:10.682 | 7f73b1ffb700 ] MetadataCache UpdateStatAttributes: /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 to_set: 48
[ D | 2/23 02:18:10.682 | 7f73b1ffb700 ] getattr on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
[ D | 2/23 02:18:10.683 | 7f73b1ffb700 ] getattr: serving from stat-cache /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/.1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3 1487045
[ D | 2/23 02:18:10.683 | 7f73b27fc700 ] getattr on path /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/1516326588832_1516326590832_1515489024983_837607680.mp4
[ D | 2/23 02:18:10.683 | 7f73b27fc700 ] MetadataCache GetStat miss: /video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/1516326588832_1516326590832_1515489024983_837607680.mp4 [100000]
[ D | 2/23 02:18:10.685 | 7f73b27fc700 ] The server xt.sysrq.io:32636 (a756c159-ee01-4543-887f-bddd803ce538) denied the requested operation. Error Value: POSIX_ERROR_ENOENT Error message: file or directory 'unifi/video/c9d1f803-8bcd-3075-8e27-0a516ab6cfd8/2018/01/19/1516326588832_1516326590832_1515489024983_837607680.mp4' does not exist (errno=POSIX_ERROR_ENOENT)
We have seen the voucher in the mrc log, too. I have not tried the full SSL mode, but I can enable it for a try. I am using the openssl version: 1.0.2g-1ubuntu4.10, which is the default in ubuntu 16.04.
Hi, thanks for your help.
Let me try and get your use case straight, you're copying a video file into XtreemFS and after the copy is done you rename the file .1516326588832_1516326590832_1515489024983_837607680.mp4.grXPU3
--> 1516326588832_1516326590832_1515489024983_837607680.mp4
? It appears some operation is trying to be performed on the file before it has been successfully created.
The voucher should be renewed, I don't think the error has something to do with the transport endpoint not connected
error, or are you observing the error each time the client crashes? Are you using the quota feature of XtreemFS?
Cheers Robert
Yes the files are copied onto the share using the default behaviour of the tool rsync (e.g. not using --inplace). Rsync does a stat on the file, to compare timestamps and filesizes and copies the content into a temporary file. Afterwards it does a rename on the file. From the rsync manpage:
-c, --checksum This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a lqquick checkrq that (by default) checks if each file's size and time of last modification match between the sender and receiver. [..]
I am using the default rsync behaiour.
I need to verify that the voucher error I posted before, from the mrc log is visible, too when vivaldi is not used.
I have seen the error transport endpoint not connected
every time after sending SIGABRT to rsync. The diference between vivaldi and default was that the mount.xtreemfs core dumps additionally to the transport endpoint not connected
Errors.
I am not sure if this share has currently a quota set, or if I removed the quota setting for the tests. I am going to verify this.
Lukas
After a few days of testing and investigation, after I switched to full SSL. I was in my opinion not able to reproduce this issue as it was initially described here. There are a few issues while moving the way down to unencrypted connections, which may be related to the initial topic in this Issue.
Since the setup moved so far away, I am not sure if the issues match in here, but I give it a try.
OSD getting unresponsive: With a unencrypted connection,with vivaldi it is visible that the OSDs get sometimes unresponsive. The connection of the OSDs is in a datacenter interconnected by Ethernet. When this occours no new OSD is selected and the Transport is not connected anymore.
[ D | 2/24 21:41:00.142 | 7f6321ffb700 ] xtreemfs_fuse_read finished /images/control.qcow2 s:131072 o:2142370240 r:131072
[ D | 2/24 21:41:00.143 | 7f6320ff9700 ] xtreemfs_fuse_read /images/control.qcow2 s:1600 o:2142501312
[ D | 2/24 21:41:00.144 | 7f6320ff9700 ] xtreemfs_fuse_read finished /images/control.qcow2 s:1600 o:2142501312 r:1600
[ D | 2/24 21:41:00.145 | 7f63227fc700 ] xtreemfs_fuse_read /images/control.qcow2 s:129472 o:2142502912
[ D | 2/24 21:41:00.149 | 7f63227fc700 ] xtreemfs_fuse_read finished /images/control.qcow2 s:129472 o:2142502912 r:129472
[ D | 2/24 21:41:00.149 | 7f6321ffb700 ] xtreemfs_fuse_read /images/control.qcow2 s:131072 o:2142632384
[ D | 2/24 21:41:00.154 | 7f6321ffb700 ] xtreemfs_fuse_read finished /images/control.qcow2 s:131072 o:2142632384 r:131072
[ D | 2/24 21:41:00.155 | 7f6320ff9700 ] xtreemfs_fuse_read /images/control.qcow2 s:1600 o:2142763456
[ D | 2/24 21:41:00.156 | 7f6320ff9700 ] xtreemfs_fuse_read finished /images/control.qcow2 s:1600 o:2142763456 r:1600
[ D | 2/24 21:41:06.953 | 7f63237fe700 ] START open_file_table: Periodic filesize update for 2 open files.
[ D | 2/24 21:41:06.953 | 7f63237fe700 ] END open_file_table: Periodic filesize update for 2 open files.
[ D | 2/24 21:41:06.960 | 7f6323fff700 ] START open_file_table: Periodic XCap renewal for 2 open files.
[ D | 2/24 21:41:06.960 | 7f6323fff700 ] Renew XCap for file_id: 39 Expiration in: 540
[ D | 2/24 21:41:06.960 | 7f6323fff700 ] Renew XCap for file_id: 39 Expiration in: 540
[ D | 2/24 21:41:06.960 | 7f6323fff700 ] Renew XCap for file_id: 40 Expiration in: 540
[ D | 2/24 21:41:06.960 | 7f6323fff700 ] END open_file_table: Periodic XCap renewal for 2 open files.
[ D | 2/24 21:41:06.962 | 7f633080b700 ] XCap renewed for file_id: 39
[ D | 2/24 21:41:06.963 | 7f633080b700 ] XCap renewed for file_id: 39
[ D | 2/24 21:41:06.964 | 7f633080b700 ] XCap renewed for file_id: 40
[ D | 2/24 21:41:09.294 | 7f6331c2f700 ] AsyncWriteHandler::(Re)Write for file_id: f5d971a3-f327-42bc-95cf-f1a4da623cfd:39, XCap Expiration in: 597
[ D | 2/24 21:41:09.294 | 7f6331c2f700 ] AsyncWriteHandler::(Re)Write for file_id: f5d971a3-f327-42bc-95cf-f1a4da623cfd:39, XCap Expiration in: 597
[ E | 2/24 21:41:09.320 | 7f6331c2f700 ] Got no response from server osd5.example.com:32641 (e114ae07-8249-47d9-aec7-745e302e08ac), retrying (infinite attempts left) (Possible reason: The server is using SSL, and the client is not.)
[ I | 2/24 21:41:09.321 | 7f6331c2f700 ] Retrying. Waiting 14 more seconds till next retry.
[ D | 2/24 21:41:24.294 | 7f6331c2f700 ] AsyncWriteHandler::(Re)Write for file_id: f5d971a3-f327-42bc-95cf-f1a4da623cfd:39, XCap Expiration in: 582
[ D | 2/24 21:41:24.294 | 7f6331c2f700 ] AsyncWriteHandler::(Re)Write for file_id: f5d971a3-f327-42bc-95cf-f1a4da623cfd:39, XCap Expiration in: 582
[ I | 2/24 21:41:24.298 | 7f6331c2f700 ] Retrying. Waiting 14 more seconds till next retry.
[ D | 2/24 21:41:39.296 | 7f6331c2f700 ] AsyncWriteHandler::(Re)Write for file_id: f5d971a3-f327-42bc-95cf-f1a4da623cfd:39, XCap Expiration in: 567
[ D | 2/24 21:41:39.296 | 7f6331c2f700 ] AsyncWriteHandler::(Re)Write for file_id: f5d971a3-f327-42bc-95cf-f1a4da623cfd:39, XCap Expiration in: 567
[ E | 2/24 21:41:39.323 | 7f6331c2f700 ] Got no response from server osd5.example.com:32641 (e114ae07-8249-47d9-aec7-745e302e08ac), retrying (infinite attempts left) (Possible reason: The server is using SSL, and the client is not.)
[ I | 2/24 21:41:39.323 | 7f6331c2f700 ] Retrying. Waiting 14 more seconds till next retry.
[ D | 2/24 21:41:40.950 | 7f6332430700 ] Vivaldi: recalculating against: 4315274c-7abb-4241-a337-e4fa0fd9bb43
[ D | 2/24 21:41:40.961 | 7f6332430700 ] Vivaldi: ping response received. Measured time: 10 ms
[ D | 2/24 21:41:40.961 | 7f6332430700 ] Vivaldi: storing coordinates to file: (3.52833, -4.74291)
[ D | 2/24 21:41:40.988 | 7f6332430700 ] Vivaldi: Sending coordinates to DIR.
[ D | 2/24 21:41:40.989 | 7f6332430700 ] Vivaldi: sleeping during 280 s.
The latest setup contain a few settings for the OSDs mentioned in a mailinglist. This seem to improve it a bit.
max_client_queue = 1500
flease.retries = 5
flease.message_to_ms = 10000
flease.lease_timeout_ms = 300000
capability_timeout = 5000
It seems to happen regularily that the OSD logs a overloaded stage.
[ W | StorageThread | OSD PreProcSt | 42 | Feb 18 23:28:37 ] stage is overloaded, request 2395121 for c7ee8a65-ae8c-424a-bbcf-0512cb513e18:140576 dropped
Even when the node does not look like it has utilized network, diskio, ram or cpu.
It seems to be a general problem in a few places where a new OSD is not selected. I am wondering why there are "connection" problems between the OSDs/MRC and clients. There are no visible problems in the network connectivity. The interconnect over WAN is minimum 250Mbit (sync) with a low latency 5-10ms.
Some Examples for this:
[ W | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 14:15:36 ] client sent too many requests... not accepting new requests from /xxxx:xxxx:1efd:1701:225:xxxx:xxxx:xxxx:58250, q=101
[ W | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 14:15:37 ] client allowed to send data again: /xxxx:xxxx:1efd:1701:225:xxxx:xxxx:xxxx:58250, q=49
[ W | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 14:16:10 ] client sent too many requests... not accepting new requests from /xxxx:xxxx:1efd:1701:225:xxxx:xxxx:xxxx:58250, q=101
[ W | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 14:16:10 ] client allowed to send data again: /xxxx:xxxx:1efd:1701:225:xxxx:xxxx:xxxx:58250, q=49
[ W | RWReplicationStage | RWReplSt | 26 | Feb 25 03:24:21 ] (R:a88d1832-484e-48f2-81c6-430fc71401d4) replica for file f5d971a3-f327-42bc-95cf-f1a4da623cfd:39 failed (in method: processSetAuthoritativeState): INTERNAL_SERVER_ERROR/POSIX_ERROR_EIO: (R:a88d1832-484e-48f2-81c6-430fc71401d4) read status FAILED for f5d971a3-f327-42bc-95cf-f1a4da623cfd:39 on dbc52792-e3d7-416f-a21b-1740f44a6e9c (this is request #2 out of 2 which failed); java.io.IOException: sending RPC failed: server closed connection (java.io.IOException: Connection reset by peer);
java.io.IOException: sending RPC failed: server closed connection (java.io.IOException: Connection reset by peer)
at org.xtreemfs.foundation.pbrpc.client.RPCResponse.get(RPCResponse.java:69)
at org.xtreemfs.osd.rwre.CoordinatedReplicaUpdatePolicy$1.responseAvailable(CoordinatedReplicaUpdatePolicy.java:123)
at org.xtreemfs.foundation.pbrpc.client.RPCResponse.requestFailed(RPCResponse.java:147)
at org.xtreemfs.foundation.pbrpc.client.RPCNIOSocketClient.closeConnection(RPCNIOSocketClient.java:716)
at org.xtreemfs.foundation.pbrpc.client.RPCNIOSocketClient.readConnection(RPCNIOSocketClient.java:532)
at org.xtreemfs.foundation.pbrpc.client.RPCNIOSocketClient.run(RPCNIOSocketClient.java:271)
[ W | RWReplicationStage | RWReplSt | 22 | Feb. 26 00:36:54 ] (R:dbc52792-e3d7-416f-a21b-1740f44a6e9c) replica for file f5d971a3-f327-42bc-95cf-f1a4da623cfd:38 failed (in method: processObjectFetched): ERRNO/POSIX_ERROR_EACCES: org.xtreemfs.foundation.pbrpc.client.PBRPCException: ERRNO/POSIX_ERROR_EACCES: capability is not valid (timed out) / ;
org.xtreemfs.foundation.pbrpc.client.PBRPCException: ERRNO/POSIX_ERROR_EACCES: capability is not valid (timed out) /
at org.xtreemfs.foundation.pbrpc.client.RPCResponse.get(RPCResponse.java:90)
at org.xtreemfs.osd.rwre.RWReplicationStage$4.responseAvailable(RWReplicationStage.java:564)
at org.xtreemfs.foundation.pbrpc.client.RPCResponse.responseAvailable(RPCResponse.java:136)
at org.xtreemfs.foundation.pbrpc.client.RPCNIOSocketClient.assembleResponse(RPCNIOSocketClient.java:573)
at org.xtreemfs.foundation.pbrpc.client.RPCNIOSocketClient.readConnection(RPCNIOSocketClient.java:522)
at org.xtreemfs.foundation.pbrpc.client.RPCNIOSocketClient.run(RPCNIOSocketClient.java:271)
It looks like the OSDs are quite busy on interuptions during these operations:
[ W | ReusableBuffer | Finalizer | 3 | Feb 25 23:46:27 ] buffer was finalized but not freed before! buffer = ReusableBuffer( capacity=8192 limit=8192 position=8192), refCount=1
[ W | ReusableBuffer | Finalizer | 3 | Feb 25 23:46:27 ] buffer was finalized but not freed before! buffer = ReusableBuffer( capacity=8192 limit=8192 position=8192), refCount=1
[ W | ReusableBuffer | Finalizer | 3 | Feb 25 23:46:27 ] buffer was finalized but not freed before! buffer = ReusableBuffer( capacity=8192 limit=8192 position=8192), refCount=1
[ W | ReusableBuffer | Finalizer | 3 | Feb 25 23:46:27 ] buffer was finalized but not freed before! buffer = ReusableBuffer( capacity=8192 limit=8192 position=8192), refCount=1
[ W | ReusableBuffer | Finalizer | 3 | Feb 25 23:46:27 ] buffer was finalized but not freed before! buffer = ReusableBuffer( capacity=8192 limit=4136 position=4136), refCount=1
[ E | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 23:55:43 ] invalid record marker size (40/39321498/0) received, closing connection to client /xxxx:xxxx:xxxx:5a57:0:0:0:xxxx:47850
[ E | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 23:56:10 ] invalid record marker size (828517681/926167142/875847990) received, closing connection to client /xxxx:xxxx:xxxx:5a57:0:0:0:xxxx:47866
[ E | RWReplicationStage | RWReplSt | 26 | Feb 25 23:56:35 ] received maxObjAvail event for unknow file: f5d971a3-f327-42bc-95cf-f1a4da623cfd:38
[ E | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 23:56:35 ] invalid record marker size (842479922/761607012/925709361) received, closing connection to client /xxxx:xxxx:xxxx:5a57:0:0:0:xxxx:47872
[ E | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 23:56:50 ] invalid record marker size (1577686586/0/4398) received, closing connection to client /xxxx:xxxx:xxxx:5a57:0:0:0:xxxx:47876
[ E | RPCNIOSocketServer | PBRPCSrv@32640 | 15 | Feb 25 23:57:05 ] invalid record marker size (962796068/1697722676/1634021431) received, closing connection to client /xxxx:xxxx:xxxx:5a57:0:0:0:xxxx:47878
I noticed that the whole xtreemfs gets unstable and very slow, when one of the OSDs crashes, until it comes back. Based on the network traffic it did not start any re-replication.
I am going to add a few more issues for things I found during the investigation.
Hi, could you share an anonymized version of your DIR, MRC and OSD configurations? What is the block size you're using, and the default replication factor? Are you primarily working with (few) large files and/or (many) small files? Are your clients running in parallel and concurrently generate the workload? How many OSDs do you have in your setup? Are they all in the same DC? I'm trying to figure out how the OSD manages to get overloaded (in particular the Preprocessing Stage). You can increase the queue capacity using the max_requests_queue_length
parameter (defaults to 1000).
Attached you find the configuration files of the dir, mrc and one osd. The difference in the OSDs may be the listening port and listen ip and the data path.
As the stripe size I am using the default value of 128kB on a xfs filesystem and the default block size.
~# xtfsutil /mnt/xtreemfs/libvirt
Path (on volume) /
XtreemFS file Id f5d971a3-f327-42bc-95cf-f1a4da623cfd:1
XtreemFS URL pbrpc://xtfs_head.example.com:32638/libvirt
Owner root
Group root
Type volume
Available Space 6.64651918 TB
Quota / Used Space unlimited / 280 GB
Voucher Size 10 MB
Default User Quota unlimited
Default GroupQuota unlimited
Num. Files/Dirs 19 / 28
Access Control p. POSIX (permissions & ACLs)
OSD Selection p. 1000,3003
Replica Selection p. 3003
Default Striping p. STRIPING_POLICY_RAID0 / 1 / 128kB
Default Repl. p. WaR1 with 3 replicas
Snapshots enabled no
Tracing enabled no
Selectable OSDs 12580201-8174-42c0-a3ef-e3f2b58eaaa0 (o3.rbx.xxxx:32640)
4315274c-7abb-4241-a337-e4fa0fd9bb43 (o3.ne.xxxx:32640)
8b1328af-85df-4d30-81e5-81f7d71396ce (o1.ne.xxxx:32640)
a88d1832-484e-48f2-81c6-430fc71401d4 (o1.rbx.xxxx:32640)
e114ae07-8249-47d9-aec7-745e302e08ac (o2.rbx.xxxx:32640)
# xfs_info /data/osd1
meta-data=/dev/mapper/data1-xtreemfs isize=512 agcount=4, agsize=118829056 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1 spinodes=0
data = bsize=4096 blocks=475316224, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=232088, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
I tried multiple combinations of WqRq and WaR1 with a replication of 2 and 3, which seems to make no real difference. During my tests I tried to rsync different types of data onto the share. I tried 500GB of smaller files mostly around 1-3Mb each and a few bigger files onto a seperate share, each of them between 50Gb and 500Gb per file. All of them showed issues. I started to rsync these files in parallel to one mrc (2 parallel rsync jobs) and reduced it during the tests to one parallel rsync job at ones. Both approaches showed these issues, but if I remember correctly the parallel approach showed the issues faster. I started the tests with with 4 OSDs on one node at home and two additional OSDs in the remote DC on one node. Now I have 5 OSDs at home on 2 nodes and additional 3 OSDs on the remote DC. During the copy of the files the most traffic was in the DC only. My first tests included a massive usage if the WAN connection, which was reduced massively.
I am going to tune the max_requests_queue_length
parameter to test this.
Thanks for the very detailed description.
We've had some issues with rsync in the past if I remember correctly, and it might also relate to #340. My hunch is that it might have something to do with the many metadata operations performed. I see you're using no_atime=true
at the MRC, which is the recommended setting. However maybe you want to specify --fuse-option noatime
explicitly. You could check the output of mount
to verify XtreemFS is mounted with noatime
, as well as your XFS partitions. This might greatly reduce the load incurred by rsync. Seeing that parallel rsync jobs produce the issue faster hints also at this.
Where is your MRC and your DIR located in the above picture? I assume you only have one DIR and one MRC, and you're not using the replication feature of these two services?
You could also try and use bigger block sizes as well, e.g. 1 MiB for the smaller files, and 256 MiB for the larger files.
I'm starting to think that this could be an issue that will be difficult to diagnose for us, as well as fix it. This might have implications on many parts of the system, which I'm not sure we'll have the resources of addressing. Also FUSE itself has some quirks that we cannot fully work around.
Reading the manpage of rsync shows the following option:
--bwlimit=RATE limit socket I/O bandwidth
Maybe for an artificially limited rate, the XtreemFS services will be able to keep up with the number of request, and you could try and increase the --bwlimit
until the issue occurs again. This way, we might be able to control the occurence of the issue a little better. This however is only an idea, and I'm not exactly sure this will help isolate the cause of the issue.
Thanks again for being so willing to help us out here!
Hi,
I am currently testing if xtreemfs matches my needs. I am using the current unstable version 1.6.0.5 on ubuntu 16.04. While rsyncing files onto the share I constantly see after some time segfaults in the mount.xtreemfs.
Without debug enabled there are no interesting errors in the logs. Attached you find the backtrace of one of the core dumps.