leo-project / leofs

The LeoFS Storage System
https://leo-project.net/leofs/
Apache License 2.0
1.55k stars 155 forks source link

Deleting bucket eventually fails and makes delete queues stuck #725

Open vstax opened 7 years ago

vstax commented 7 years ago

I got test cluster (1.3.4, 3 storage nodes, N=2, W=1). There are two buckets, "body" and "bodytest", each containing the same objects, about 1M in each (there are some other buckets as well, but they hardly contain anything). In other words, there are slightly over 2M objects in cluster in total. At the start of this test the data is fully consistent. There is some minor load on cluster with "body" buckets - some PUT & GET operations, but very few of them. No one tries to access "bodytest" bucket.

I want to remove "bodytest" with all its objects. I execute s3cmd rb s3://bodytest. I see load on gateway and storage nodes; after some time, s3cmd fails because of timeout (I expect this to happen, no way storage can find all 1M objects and mark them as deleted fast enough). I see leo_async_deletion_queue queues growing on storage nodes:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|grep leo_async_deletion_queue leo_async_deletion_queue | idling | 97845 | 0 | 3000 | async deletion of objs [root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|grep leo_async_deletion_queue leo_async_deletion_queue | idling | 102780 | 0 | 3000 | async deletion of objs [root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|grep leo_async_deletion_queue leo_async_deletion_queue | idling | 104911 | 0 | 3000 | async deletion of objs [root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|grep leo_async_deletion_queue leo_async_deletion_queue | idling | 108396 | 0 | 3000 | async deletion of objs

The same for storage_0 and storage_1. There is hardly any disk load, each storage nodes consumes 120-130% CPU as per top.

Then some errors appear in error log on gateway_0:

[W] gateway_0@192.168.3.52  2017-05-04 17:15:53.998704 +0300    1493907353  leo_gateway_s3_api:delete_bucket_2/3    1774    [{cause,timeout}]
[W] gateway_0@192.168.3.52  2017-05-04 17:15:58.999759 +0300    1493907358  leo_gateway_s3_api:delete_bucket_2/3    1774    [{cause,timeout}]
[W] gateway_0@192.168.3.52  2017-05-04 17:16:07.11702 +0300 1493907367  leo_gateway_s3_api:delete_bucket_2/3    1774    [{cause,timeout}]
[W] gateway_0@192.168.3.52  2017-05-04 17:16:12.12715 +0300 1493907372  leo_gateway_s3_api:delete_bucket_2/3    1774    [{cause,timeout}]
[W] gateway_0@192.168.3.52  2017-05-04 17:16:23.48706 +0300 1493907383  leo_gateway_s3_api:delete_bucket_2/3    1774    [{cause,timeout}]
[W] gateway_0@192.168.3.52  2017-05-04 17:16:28.49750 +0300 1493907388  leo_gateway_s3_api:delete_bucket_2/3    1774    [{cause,timeout}]
[W] gateway_0@192.168.3.52  2017-05-04 17:17:01.702719 +0300    1493907421  leo_gateway_rpc_handler:handle_error/5  303 [{node,'storage_0@192.168.3.53'},{mod,leo_storage_handler_object},{method,put},{cause,timeout}]
[W] gateway_0@192.168.3.52  2017-05-04 17:17:06.703840 +0300    1493907426  leo_gateway_rpc_handler:handle_error/5  303 [{node,'storage_1@192.168.3.54'},{mod,leo_storage_handler_object},{method,put},{cause,timeout}]

If these errors mean that gateway sent "timeout" to the client that requested "delete bucket" operation, plus some other timeouts due to load on system - then it's within expectations; as long as all data from that bucket will eventually be marked as "deleted" asynchronously, all is fine.

That's not what happens, however. At some point - few minutes after the "delete bucket" operation - delete queues stop growing or reducing. It's as if they are stuck. Here is their current state - 1.5 hours after the experiment; they got to that state within 5-10 minutes after start of experiment and never changed since (I show only one queue here, others are empty):

[root@leo-m0 app]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53|grep leo_async_deletion_queue
 leo_async_deletion_queue       |   idling    | 0              | 1600           | 500            | async deletion of objs
[root@leo-m0 app]# /usr/local/bin/leofs-adm mq-stats storage_1@192.168.3.54|grep leo_async_deletion_queue
 leo_async_deletion_queue       |   idling    | 53559          | 0              | 3000           | async deletion of objs
[root@leo-m0 app]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|grep leo_async_deletion_queue
 leo_async_deletion_queue       |   idling    | 136972         | 0              | 3000           | async deletion of objs

There is nothing in logs of manager nodes. There is nothing in erlang.log files on storage nodes (no mention of restarts or anything). Logs on storage nodes, info log for storage_0:

[I] storage_0@192.168.3.53  2017-05-04 17:16:06.79494 +0300 1493907366  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,17086}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:06.80095 +0300 1493907366  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,14217}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:06.80515 +0300 1493907366  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/3e/b3/bd/3eb3bde5f5e58a67db86dbea8dd6850810fabdf41ee1e65ba1dd8395279175259b0fc7cf9b4b60f1cbc200d2d8bd541e00e6000000000000.xz">>},{processing_time,12232}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:24.135151 +0300    1493907384  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,30141}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:34.277504 +0300    1493907394  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,28198}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:34.277892 +0300    1493907394  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/0a/59/ab/0a59aba721c409c8f9bf0bba176d10242380842653e6994f782fbe31cb2296b46cba031a085b4af057ab43314631e3691c5c000000000000.xz">>},{processing_time,24674}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:34.280088 +0300    1493907394  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/25/03/16/250316ef50b26272f99b757409d75c173135d2ef09d972821072348ad071e49897dd7245c1f250db6489a401aea567d9886e000000000000.xz">>},{processing_time,5303}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:43.179282 +0300    1493907403  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,41173}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:43.179708 +0300    1493907403  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,18328}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:43.180082 +0300    1493907403  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,37101}]
[I] storage_0@192.168.3.53  2017-05-04 17:16:43.180461 +0300    1493907403  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/3e/b3/bd/3eb3bde5f5e58a67db86dbea8dd6850810fabdf41ee1e65ba1dd8395279175259b0fc7cf9b4b60f1cbc200d2d8bd541e00e6000000000000.xz">>},{processing_time,37100}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:03.11597 +0300 1493907423  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,28734}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:03.12445 +0300 1493907423  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/04/12/d6/0412d6227769cfef42764802004d465e83b309d884ec841d330ede99a2a551eda52ddda2b9774531e88b0b060dbb3a17c092000000000000.xz">>},{processing_time,27558}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:03.12986 +0300 1493907423  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/03/68/ed/0368ed293eded8de34b8728325c08f1603edcede7e8de7778c81d558b5b0c9cd9f85307d1c571b9e549ef92e9ec69498005a7b0000000000.xz\n1">>},{processing_time,8691}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:03.809203 +0300    1493907423  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,56801}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:03.809742 +0300    1493907423  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/05/67/2b/05672be039f98d72ef426d413ae66d6aa33b472625a448879569a25ca29cdb8699580886a9759e357e97cc834bef15dd84d2000000000000.xz">>},{processing_time,9943}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:32.116059 +0300    1493907452  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,74073}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:32.116367 +0300    1493907452  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/34/86/01/348601a3d08bf38a4cb5bd8e22dae951d689def13b7bd1cc9c08cd0200a3cd52c6e015bf3be722d810a94f132752faf278c3000000000000.xz">>},{processing_time,36264}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:32.116743 +0300    1493907452  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/4e/cc/ef/4eccef1b917e48d1df702faab63181162c7a8f67998d7b5ef11ac33940ffe6362a8d1671c5e5f2c39945669b1d04f1ef0027720000000000.xz\n1">>},{processing_time,14748}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:32.117065 +0300    1493907452  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/66/06/b6/6606b6f9fab7e81872e8a628b696634b0c25294b0a79fbaf03c10ec49aff1d44a0921efcba5a53127e84d83edf7206d758ac850000000000.xz\n1">>},{processing_time,13881}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:33.13033 +0300 1493907453  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,30002}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:36.111764 +0300    1493907456  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/06/04/2d/06042debc9eb14bd654582d79019c02698f9514ae73709dda7f6a614868d294819908ad755cf814fb43059560fe2f0c984c3010000000000.xz">>},{processing_time,30001}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:40.473183 +0300    1493907460  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/dd/f3/60/ddf360276b8ccf938a0cfdb8f260bdf2acdcd85b9cf1e2c8f8d3b1b2d0ad17554e8a1d7d2490d396e7ad08532c9e90ac7cf2040100000000.xz\n1">>},{processing_time,16375}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:40.478082 +0300    1493907460  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/2c/5a/d2/2c5ad26023e73f37b47ee824eea753e550b99dc2945281102253e12b88c122dfbc7fcdad9706e0ee6d0dc19e86d10b76a8277f0000000000.xz\n2">>},{processing_time,7066}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.502257 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,84458}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.503120 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/33/8b/4c/338b4c25ca8fbb66a86f24bfc302f2fa4a9c657074c14e41692e5864a121849c4ad9a0f7342a35a16fc906d159980560782b010000000000.xz">>},{processing_time,20645}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.503488 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/0e/6e/8b/0e6e8bcdbc732024193f7114b3f7d607333a9d3212a71e7104aea2b2b3bc137514eadd9c4d7de516e345feb9764186d9389d000000000000.xz">>},{processing_time,22633}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.503863 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/e3/32/e0/e332e00f0f77cf322cbc1d7a30369f8681073076c49868ab9cd5cee6043dfe3ebc8c355ae5899b74602ba763dcba872450bc560000000000.xz\n1">>},{processing_time,5896}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.521029 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,19185}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.521524 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/06/56/e3/0656e37a1ff09fb11abf969cb0b795905d3ed78087be15c01ca8e5b840395ca076c82eae02ba6e0f84f9d90dbf0f3a300600100000000000.xz">>},{processing_time,83386}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.521894 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,19168}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.522149 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,64342}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.522401 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,18958}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.522652 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,18803}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.522912 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,14251}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.524355 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,put},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,50816}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.525083 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,put},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,45786}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.526651 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/05/67/2b/05672be039f98d72ef426d413ae66d6aa33b472625a448879569a25ca29cdb8699580886a9759e357e97cc834bef15dd84d2000000000000.xz">>},{processing_time,43717}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.527288 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,put},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,22024}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.527732 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/4e/cc/ef/4eccef1b917e48d1df702faab63181162c7a8f67998d7b5ef11ac33940ffe6362a8d1671c5e5f2c39945669b1d04f1ef0027720000000000.xz\n1">>},{processing_time,15411}]
[I] storage_0@192.168.3.53  2017-05-04 17:17:47.528128 +0300    1493907467  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/66/06/b6/6606b6f9fab7e81872e8a628b696634b0c25294b0a79fbaf03c10ec49aff1d44a0921efcba5a53127e84d83edf7206d758ac850000000000.xz\n1">>},{processing_time,15411}]

error log on storage_0:

[W] storage_0@192.168.3.53  2017-05-04 17:16:23.854142 +0300    1493907383  leo_storage_replicator:loop/6   216 [{method,delete},{key,<<"bodytest/3e/b3/bd/3eb3bde5f5e58a67db86dbea8dd6850810fabdf41ee1e65ba1dd8395279175259b0fc7cf9b4b60f1cbc200d2d8bd541e00e6000000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:16:24.850092 +0300    1493907384  leo_storage_replicator:replicate/5  123 [{method,delete},{key,<<"bodytest/3e/b3/bd/3eb3bde5f5e58a67db86dbea8dd6850810fabdf41ee1e65ba1dd8395279175259b0fc7cf9b4b60f1cbc200d2d8bd541e00e6000000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:16:54.851871 +0300    1493907414  leo_storage_replicator:loop/6   216 [{method,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:16:55.851836 +0300    1493907415  leo_storage_replicator:replicate/5  123 [{method,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:25.853134 +0300    1493907445  leo_storage_replicator:loop/6   216 [{method,delete},{key,<<"bodytest/34/86/01/348601a3d08bf38a4cb5bd8e22dae951d689def13b7bd1cc9c08cd0200a3cd52c6e015bf3be722d810a94f132752faf278c3000000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:26.855813 +0300    1493907446  leo_storage_replicator:replicate/5  123 [{method,delete},{key,<<"bodytest/34/86/01/348601a3d08bf38a4cb5bd8e22dae951d689def13b7bd1cc9c08cd0200a3cd52c6e015bf3be722d810a94f132752faf278c3000000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:36.117939 +0300    1493907456  leo_storage_replicator:loop/6   216 [{method,delete},{key,<<"bodytest/06/04/2d/06042debc9eb14bd654582d79019c02698f9514ae73709dda7f6a614868d294819908ad755cf814fb43059560fe2f0c984c3010000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:37.113194 +0300    1493907457  leo_storage_replicator:replicate/5  123 [{method,delete},{key,<<"bodytest/06/04/2d/06042debc9eb14bd654582d79019c02698f9514ae73709dda7f6a614868d294819908ad755cf814fb43059560fe2f0c984c3010000000000.xz">>},{cause,timeout}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:47.534739 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907416708152},{cause,primary_inconsistency}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:47.538512 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907421704031},{cause,primary_inconsistency}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:47.542151 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907421704031},{cause,primary_inconsistency}]
[W] storage_0@192.168.3.53  2017-05-04 17:17:47.549344 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907421704031},{cause,primary_inconsistency}]

Info log on storage_1:

[I] storage_1@192.168.3.54  2017-05-04 17:16:10.725946 +0300    1493907370  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,21667}]
[I] storage_1@192.168.3.54  2017-05-04 17:16:20.764386 +0300    1493907380  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,26764}]
[I] storage_1@192.168.3.54  2017-05-04 17:16:37.95550 +0300 1493907397  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,35064}]
[I] storage_1@192.168.3.54  2017-05-04 17:16:47.109806 +0300    1493907407  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,40093}]
[I] storage_1@192.168.3.54  2017-05-04 17:17:03.480048 +0300    1493907423  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,45433}]
[I] storage_1@192.168.3.54  2017-05-04 17:17:03.480713 +0300    1493907423  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/f0/98/63/f0986371cf98c032e6c870ae6f4a26fac08ae91958c14c86478b39b758ea58953095a32412961708a0a9090e0d2da4edf615cc0100000000.xz\n1">>},{processing_time,5704}]
[I] storage_1@192.168.3.54  2017-05-04 17:17:13.497836 +0300    1493907433  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,50442}]
[I] storage_1@192.168.3.54  2017-05-04 17:17:13.503749 +0300    1493907433  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/4e/18/4f/4e184f78326e5665991244965ecf1a3bca129ed4353adb0d8cc63a5c7d8a7a49a7ade04120ba3a5c75e18c5be2da79ffa829a40000000000.xz\n2">>},{processing_time,8485}]
[I] storage_1@192.168.3.54  2017-05-04 17:17:13.637206 +0300    1493907433  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,put},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,16918}]
[I] storage_1@192.168.3.54  2017-05-04 17:17:13.641295 +0300    1493907433  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,put},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,11937}]
[I] storage_1@192.168.3.54  2017-05-04 17:17:13.660910 +0300    1493907433  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/f0/98/63/f0986371cf98c032e6c870ae6f4a26fac08ae91958c14c86478b39b758ea58953095a32412961708a0a9090e0d2da4edf615cc0100000000.xz\n1">>},{processing_time,10180}]

error log on storage_1:

[E] storage_1@192.168.3.54  2017-05-04 17:16:10.720827 +0300    1493907370  leo_backend_db_eleveldb:prefix_search/3222  {timeout,{gen_server,call,[leo_async_deletion_queue_1,{enqueue,<<131,104,2,110,16,0,38,115,152,115,44,32,50,32,91,246,196,247,235,102,48,217,109,0,0,0,133,98,111,100,121,116,101,115,116,47,57,54,47,54,50,47,97,57,47,57,54,54,50,97,57,57,57,51,51,50,49,51,54,52,53,100,48,50,54,102,51,57,56,53,56,97,48,50,51,99,50,100,48,54,100,101,50,51,98,55,101,56,48,53,52,52,56,48,51,102,48,50,98,100,50,51,52,49,98,102,53,53,102,55,48,54,56,50,100,100,99,54,51,102,52,53,52,52,55,48,99,49,51,102,99,100,48,101,51,51,100,50,52,55,102,48,52,48,48,48,48,48,48,48,48,48,48,48,48,48,48,46,120,122>>,<<131,104,6,100,0,22,97,115,121,110,99,95,100,101,108,101,116,105,111,110,95,109,101,115,115,97,103,101,110,7,0,217,31,144,105,179,78,5,110,16,0,38,115,152,115,44,32,50,32,91,246,196,247,235,102,48,217,109,0,0,0,133,98,111,100,121,116,101,115,116,47,57,54,47,54,50,47,97,57,47,57,54,54,50,97,57,57,57,51,51,50,49,51,54,52,53,100,48,50,54,102,51,57,56,53,56,97,48,50,51,99,50,100,48,54,100,101,50,51,98,55,101,56,48,53,52,52,56,48,51,102,48,50,98,100,50,51,52,49,98,102,53,53,102,55,48,54,56,50,100,100,99,54,51,102,52,53,52,52,55,48,99,49,51,102,99,100,48,101,51,51,100,50,52,55,102,48,52,48,48,48,48,48,48,48,48,48,48,48,48,48,48,46,120,122,110,5,0,160,179,127,210,14,97,0>>},10000]}}
[E] storage_1@192.168.3.54  2017-05-04 17:16:20.760699 +0300    1493907380  leo_backend_db_eleveldb:prefix_search/3222  {timeout,{gen_server,call,[leo_async_deletion_queue_1,{enqueue,<<131,104,2,110,16,0,166,167,75,13,81,196,48,74,78,229,160,88,48,152,97,28,109,0,0,0,133,98,111,100,121,116,101,115,116,47,48,48,47,48,48,47,48,56,47,48,48,48,48,48,56,55,51,57,57,102,53,55,98,56,53,54,100,57,52,56,50,99,101,53,50,57,50,55,56,99,99,100,53,100,50,55,101,98,51,101,49,49,57,98,53,97,48,99,50,102,99,98,56,57,49,97,52,55,102,100,48,49,56,49,56,51,98,56,51,100,102,101,55,53,99,99,102,51,54,97,53,57,55,101,48,50,101,98,56,50,57,52,54,100,99,48,48,53,97,48,101,48,48,48,48,48,48,48,48,48,48,46,120,122>>,<<131,104,6,100,0,22,97,115,121,110,99,95,100,101,108,101,116,105,111,110,95,109,101,115,115,97,103,101,110,7,0,32,17,41,106,179,78,5,110,16,0,166,167,75,13,81,196,48,74,78,229,160,88,48,152,97,28,109,0,0,0,133,98,111,100,121,116,101,115,116,47,48,48,47,48,48,47,48,56,47,48,48,48,48,48,56,55,51,57,57,102,53,55,98,56,53,54,100,57,52,56,50,99,101,53,50,57,50,55,56,99,99,100,53,100,50,55,101,98,51,101,49,49,57,98,53,97,48,99,50,102,99,98,56,57,49,97,52,55,102,100,48,49,56,49,56,51,98,56,51,100,102,101,55,53,99,99,102,51,54,97,53,57,55,101,48,50,101,98,56,50,57,52,54,100,99,48,48,53,97,48,101,48,48,48,48,48,48,48,48,48,48,46,120,122,110,5,0,170,179,127,210,14,97,0>>},10000]}}
[E] storage_1@192.168.3.54  2017-05-04 17:16:37.94918 +0300 1493907397  leo_backend_db_eleveldb:prefix_search/3 222 {timeout,{gen_server,call,[leo_async_deletion_queue_1,{enqueue,<<131,104,2,110,16,0,205,38,194,2,251,99,149,185,246,131,149,156,96,116,90,188,109,0,0,0,133,98,111,100,121,116,101,115,116,47,55,56,47,56,56,47,102,97,47,55,56,56,56,102,97,97,48,54,54,101,50,98,54,56,52,54,99,57,98,56,52,51,55,102,52,99,100,55,98,97,53,55,100,99,52,51,55,98,100,100,98,99,51,98,56,53,51,101,101,100,48,53,98,101,57,56,101,48,97,97,99,49,98,97,97,51,51,57,52,101,55,48,55,48,98,48,101,57,98,49,99,101,99,57,99,98,99,57,101,49,50,55,54,99,54,97,56,97,97,48,48,48,48,48,48,48,48,48,48,48,48,46,120,122>>,<<131,104,6,100,0,22,97,115,121,110,99,95,100,101,108,101,116,105,111,110,95,109,101,115,115,97,103,101,110,7,0,184,132,34,107,179,78,5,110,16,0,205,38,194,2,251,99,149,185,246,131,149,156,96,116,90,188,109,0,0,0,133,98,111,100,121,116,101,115,116,47,55,56,47,56,56,47,102,97,47,55,56,56,56,102,97,97,48,54,54,101,50,98,54,56,52,54,99,57,98,56,52,51,55,102,52,99,100,55,98,97,53,55,100,99,52,51,55,98,100,100,98,99,51,98,56,53,51,101,101,100,48,53,98,101,57,56,101,48,97,97,99,49,98,97,97,51,51,57,52,101,55,48,55,48,98,48,101,57,98,49,99,101,99,57,99,98,99,57,101,49,50,55,54,99,54,97,56,97,97,48,48,48,48,48,48,48,48,48,48,48,48,46,120,122,110,5,0,187,179,127,210,14,97,0>>},10000]}}
[E] storage_1@192.168.3.54  2017-05-04 17:16:47.108568 +0300    1493907407  leo_backend_db_eleveldb:prefix_search/3222  {timeout,{gen_server,call,[leo_async_deletion_queue_1,{enqueue,<<131,104,2,110,16,0,166,167,75,13,81,196,48,74,78,229,160,88,48,152,97,28,109,0,0,0,133,98,111,100,121,116,101,115,116,47,48,48,47,48,48,47,48,56,47,48,48,48,48,48,56,55,51,57,57,102,53,55,98,56,53,54,100,57,52,56,50,99,101,53,50,57,50,55,56,99,99,100,53,100,50,55,101,98,51,101,49,49,57,98,53,97,48,99,50,102,99,98,56,57,49,97,52,55,102,100,48,49,56,49,56,51,98,56,51,100,102,101,55,53,99,99,102,51,54,97,53,57,55,101,48,50,101,98,56,50,57,52,54,100,99,48,48,53,97,48,101,48,48,48,48,48,48,48,48,48,48,46,120,122>>,<<131,104,6,100,0,22,97,115,121,110,99,95,100,101,108,101,116,105,111,110,95,109,101,115,115,97,103,101,110,7,0,23,93,187,107,179,78,5,110,16,0,166,167,75,13,81,196,48,74,78,229,160,88,48,152,97,28,109,0,0,0,133,98,111,100,121,116,101,115,116,47,48,48,47,48,48,47,48,56,47,48,48,48,48,48,56,55,51,57,57,102,53,55,98,56,53,54,100,57,52,56,50,99,101,53,50,57,50,55,56,99,99,100,53,100,50,55,101,98,51,101,49,49,57,98,53,97,48,99,50,102,99,98,56,57,49,97,52,55,102,100,48,49,56,49,56,51,98,56,51,100,102,101,55,53,99,99,102,51,54,97,53,57,55,101,48,50,101,98,56,50,57,52,54,100,99,48,48,53,97,48,101,48,48,48,48,48,48,48,48,48,48,46,120,122,110,5,0,197,179,127,210,14,97,0>>},10000]}}
[E] storage_1@192.168.3.54  2017-05-04 17:17:03.478769 +0300    1493907423  leo_backend_db_eleveldb:prefix_search/3222  {timeout,{gen_server,call,[leo_async_deletion_queue_1,{enqueue,<<131,104,2,110,16,0,37,221,237,84,123,140,135,76,39,216,128,38,178,216,253,43,109,0,0,0,133,98,111,100,121,116,101,115,116,47,52,56,47,54,100,47,53,102,47,52,56,54,100,53,102,98,51,98,55,55,52,99,56,97,102,51,97,102,54,102,100,48,51,53,102,51,54,55,98,100,48,52,52,100,98,100,56,97,97,55,49,102,51,52,98,54,51,53,51,53,57,99,57,57,102,56,48,55,101,54,51,102,98,99,54,52,48,97,53,99,56,97,98,99,50,49,51,99,52,50,49,52,51,100,52,55,101,98,57,101,55,49,101,48,99,48,52,48,48,48,48,48,48,48,48,48,48,48,48,48,48,46,120,122>>,<<131,104,6,100,0,22,97,115,121,110,99,95,100,101,108,101,116,105,111,110,95,109,101,115,115,97,103,101,110,7,0,76,233,180,108,179,78,5,110,16,0,37,221,237,84,123,140,135,76,39,216,128,38,178,216,253,43,109,0,0,0,133,98,111,100,121,116,101,115,116,47,52,56,47,54,100,47,53,102,47,52,56,54,100,53,102,98,51,98,55,55,52,99,56,97,102,51,97,102,54,102,100,48,51,53,102,51,54,55,98,100,48,52,52,100,98,100,56,97,97,55,49,102,51,52,98,54,51,53,51,53,57,99,57,57,102,56,48,55,101,54,51,102,98,99,54,52,48,97,53,99,56,97,98,99,50,49,51,99,52,50,49,52,51,100,52,55,101,98,57,101,55,49,101,48,99,48,52,48,48,48,48,48,48,48,48,48,48,48,48,48,48,46,120,122,110,5,0,213,179,127,210,14,97,0>>},10000]}}
[E] storage_1@192.168.3.54  2017-05-04 17:17:13.490725 +0300    1493907433  leo_backend_db_eleveldb:prefix_search/3222  {timeout,{gen_server,call,[leo_async_deletion_queue_1,{enqueue,<<131,104,2,110,16,0,166,167,75,13,81,196,48,74,78,229,160,88,48,152,97,28,109,0,0,0,133,98,111,100,121,116,101,115,116,47,48,48,47,48,48,47,48,56,47,48,48,48,48,48,56,55,51,57,57,102,53,55,98,56,53,54,100,57,52,56,50,99,101,53,50,57,50,55,56,99,99,100,53,100,50,55,101,98,51,101,49,49,57,98,53,97,48,99,50,102,99,98,56,57,49,97,52,55,102,100,48,49,56,49,56,51,98,56,51,100,102,101,55,53,99,99,102,51,54,97,53,57,55,101,48,50,101,98,56,50,57,52,54,100,99,48,48,53,97,48,101,48,48,48,48,48,48,48,48,48,48,46,120,122>>,<<131,104,6,100,0,22,97,115,121,110,99,95,100,101,108,101,116,105,111,110,95,109,101,115,115,97,103,101,110,7,0,225,206,77,109,179,78,5,110,16,0,166,167,75,13,81,196,48,74,78,229,160,88,48,152,97,28,109,0,0,0,133,98,111,100,121,116,101,115,116,47,48,48,47,48,48,47,48,56,47,48,48,48,48,48,56,55,51,57,57,102,53,55,98,56,53,54,100,57,52,56,50,99,101,53,50,57,50,55,56,99,99,100,53,100,50,55,101,98,51,101,49,49,57,98,53,97,48,99,50,102,99,98,56,57,49,97,52,55,102,100,48,49,56,49,56,51,98,56,51,100,102,101,55,53,99,99,102,51,54,97,53,57,55,101,48,50,101,98,56,50,57,52,54,100,99,48,48,53,97,48,101,48,48,48,48,48,48,48,48,48,48,46,120,122,110,5,0,223,179,127,210,14,97,0>>},10000]}}

Info log on storage_2:

[I] storage_2@192.168.3.55  2017-05-04 17:16:12.956911 +0300    1493907372  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,23920}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:12.958225 +0300    1493907372  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,21096}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:12.958522 +0300    1493907372  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/3e/b3/bd/3eb3bde5f5e58a67db86dbea8dd6850810fabdf41ee1e65ba1dd8395279175259b0fc7cf9b4b60f1cbc200d2d8bd541e00e6000000000000.xz">>},{processing_time,19109}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:35.444648 +0300    1493907395  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,41450}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:35.445099 +0300    1493907395  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/06/56/e3/0656e37a1ff09fb11abf969cb0b795905d3ed78087be15c01ca8e5b840395ca076c82eae02ba6e0f84f9d90dbf0f3a300600100000000000.xz">>},{processing_time,12582}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:35.445427 +0300    1493907395  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,10552}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:42.958232 +0300    1493907402  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,30001}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:53.6668 +0300  1493907413  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/25/03/16/250316ef50b26272f99b757409d75c173135d2ef09d972821072348ad071e49897dd7245c1f250db6489a401aea567d9886e000000000000.xz">>},{processing_time,24030}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:53.7199 +0300  1493907413  leo_object_storage_event:handle_event/2 54  [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/1a/82/8c/1a828c7a9d7a334714f91a8d1c56a4ec30a8dd4998c9db79f9dfed87be084a73aa090513d535e36186a986822b1d6ca9bc74010000000000.xz">>},{processing_time,18711}]
[I] storage_2@192.168.3.55  2017-05-04 17:16:55.327215 +0300    1493907415  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,53321}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:16.830422 +0300    1493907436  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,69821}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:16.830874 +0300    1493907436  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/34/86/01/348601a3d08bf38a4cb5bd8e22dae951d689def13b7bd1cc9c08cd0200a3cd52c6e015bf3be722d810a94f132752faf278c3000000000000.xz">>},{processing_time,20975}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:16.831683 +0300    1493907436  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/f0/98/63/f0986371cf98c032e6c870ae6f4a26fac08ae91958c14c86478b39b758ea58953095a32412961708a0a9090e0d2da4edf615cc0100000000.xz\n1">>},{processing_time,19056}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:16.832010 +0300    1493907436  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/4e/18/4f/4e184f78326e5665991244965ecf1a3bca129ed4353adb0d8cc63a5c7d8a7a49a7ade04120ba3a5c75e18c5be2da79ffa829a40000000000.xz\n2">>},{processing_time,11818}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:16.832350 +0300    1493907436  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,63874}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:16.832687 +0300    1493907436  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/3e/b3/bd/3eb3bde5f5e58a67db86dbea8dd6850810fabdf41ee1e65ba1dd8395279175259b0fc7cf9b4b60f1cbc200d2d8bd541e00e6000000000000.xz">>},{processing_time,63874}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:34.514878 +0300    1493907454  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,76471}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:34.515241 +0300    1493907454  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/4e/cc/ef/4eccef1b917e48d1df702faab63181162c7a8f67998d7b5ef11ac33940ffe6362a8d1671c5e5f2c39945669b1d04f1ef0027720000000000.xz\n1">>},{processing_time,17146}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:34.515530 +0300    1493907454  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/33/8b/4c/338b4c25ca8fbb66a86f24bfc302f2fa4a9c657074c14e41692e5864a121849c4ad9a0f7342a35a16fc906d159980560782b010000000000.xz">>},{processing_time,7655}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.878082 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,fetch},{key,<<"bodytest">>},{processing_time,83833}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.878513 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/0e/6e/8b/0e6e8bcdbc732024193f7114b3f7d607333a9d3212a71e7104aea2b2b3bc137514eadd9c4d7de516e345feb9764186d9389d000000000000.xz">>},{processing_time,22009}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.878950 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/17/f5/9f/17f59f95c7bd20ea310cf7bd14d0c2cc9890444c621b859e03f879ccf2700936abeafbd3d62deee9ed2e58bfa86107e4cea8040100000000.xz\n3">>},{processing_time,8458}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.879426 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,head},{key,<<"bodytest/e3/32/e0/e332e00f0f77cf322cbc1d7a30369f8681073076c49868ab9cd5cee6043dfe3ebc8c355ae5899b74602ba763dcba872450bc560000000000.xz\n1">>},{processing_time,5269}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.879704 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/06/56/e3/0656e37a1ff09fb11abf969cb0b795905d3ed78087be15c01ca8e5b840395ca076c82eae02ba6e0f84f9d90dbf0f3a300600100000000000.xz">>},{processing_time,71434}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.880035 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{processing_time,71433}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.880362 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/05/67/2b/05672be039f98d72ef426d413ae66d6aa33b472625a448879569a25ca29cdb8699580886a9759e357e97cc834bef15dd84d2000000000000.xz">>},{processing_time,51552}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.881471 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/34/86/01/348601a3d08bf38a4cb5bd8e22dae951d689def13b7bd1cc9c08cd0200a3cd52c6e015bf3be722d810a94f132752faf278c3000000000000.xz">>},{processing_time,30049}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.881907 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/f0/98/63/f0986371cf98c032e6c870ae6f4a26fac08ae91958c14c86478b39b758ea58953095a32412961708a0a9090e0d2da4edf615cc0100000000.xz\n1">>},{processing_time,30050}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.882233 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/4e/18/4f/4e184f78326e5665991244965ecf1a3bca129ed4353adb0d8cc63a5c7d8a7a49a7ade04120ba3a5c75e18c5be2da79ffa829a40000000000.xz\n2">>},{processing_time,30050}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.886477 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/4e/cc/ef/4eccef1b917e48d1df702faab63181162c7a8f67998d7b5ef11ac33940ffe6362a8d1671c5e5f2c39945669b1d04f1ef0027720000000000.xz\n1">>},{processing_time,12372}]
[I] storage_2@192.168.3.55  2017-05-04 17:17:46.887370 +0300    1493907466  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,delete},{key,<<"bodytest/33/8b/4c/338b4c25ca8fbb66a86f24bfc302f2fa4a9c657074c14e41692e5864a121849c4ad9a0f7342a35a16fc906d159980560782b010000000000.xz">>},{processing_time,12373}]

Error log on storage_2:

[W] storage_2@192.168.3.55  2017-05-04 17:16:21.862361 +0300    1493907381  leo_storage_replicator:loop/6   216 [{method,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{cause,timeout}]
[W] storage_2@192.168.3.55  2017-05-04 17:16:22.862268 +0300    1493907382  leo_storage_replicator:replicate/5  123 [{method,delete},{key,<<"bodytest/1a/11/87/1a118728e175f40a10b6390f3f579bfd3a5754401763708c8ef8f0b3bd9e5d84fbdcbb167fa850291032fcbbcd4439ef28d4000000000000.xz">>},{cause,timeout}]
[W] storage_2@192.168.3.55  2017-05-04 17:16:52.865051 +0300    1493907412  leo_storage_replicator:loop/6   216 [{method,delete},{key,<<"bodytest/06/56/e3/0656e37a1ff09fb11abf969cb0b795905d3ed78087be15c01ca8e5b840395ca076c82eae02ba6e0f84f9d90dbf0f3a300600100000000000.xz">>},{cause,timeout}]
[W] storage_2@192.168.3.55  2017-05-04 17:16:53.865848 +0300    1493907413  leo_storage_replicator:replicate/5  123 [{method,delete},{key,<<"bodytest/06/56/e3/0656e37a1ff09fb11abf969cb0b795905d3ed78087be15c01ca8e5b840395ca076c82eae02ba6e0f84f9d90dbf0f3a300600100000000000.xz">>},{cause,timeout}]
[W] storage_2@192.168.3.55  2017-05-04 17:17:23.868648 +0300    1493907443  leo_storage_replicator:loop/6   216 [{method,delete},{key,<<"bodytest/05/67/2b/05672be039f98d72ef426d413ae66d6aa33b472625a448879569a25ca29cdb8699580886a9759e357e97cc834bef15dd84d2000000000000.xz">>},{cause,timeout}]
[W] storage_2@192.168.3.55  2017-05-04 17:17:24.867539 +0300    1493907444  leo_storage_replicator:replicate/5  123 [{method,delete},{key,<<"bodytest/05/67/2b/05672be039f98d72ef426d413ae66d6aa33b472625a448879569a25ca29cdb8699580886a9759e357e97cc834bef15dd84d2000000000000.xz">>},{cause,timeout}]

To summarize the problems:

  1. Timeouts on gateway - not really a problem, as long as operation goes on asynchronously
  2. Typos "mehtod,delete", "mehtod,head", "mehtod,fetch" in info log. Note that it's correct in error log :)
  3. The fact that delete operation did not complete (I have picked a ~4300 random object names and executed "whereis" for them; around 1750 of them was marked as "deleted" on all nodes and around 2500 weren't deleted on any of them).
  4. The fact that delete queues got stuck. How do I "unfreeze" them? Reboot storage nodes? (not a problem, I'm just keeping them like that for now in case there is something else to try). There no errors or anything right now (however, debug logs are no enabled); state of all nodes is "running", but delete queue is not being processed on storage_1 and storage_2.
  5. These lines in log of storage_1
    [I] storage_1@192.168.3.54  2017-05-04 17:17:13.637206 +0300    1493907433  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,put},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,16918}]
    [I] storage_1@192.168.3.54  2017-05-04 17:17:13.641295 +0300    1493907433  leo_object_storage_event:handle_event/254   [{cause,"slow operation"},{mehtod,put},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{processing_time,11937}]

    and on storage_0:

    [W] storage_0@192.168.3.53  2017-05-04 17:17:47.534739 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907416708152},{cause,primary_inconsistency}]
    [W] storage_0@192.168.3.53  2017-05-04 17:17:47.538512 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907421704031},{cause,primary_inconsistency}]
    [W] storage_0@192.168.3.53  2017-05-04 17:17:47.542151 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907421704031},{cause,primary_inconsistency}]
    [W] storage_0@192.168.3.53  2017-05-04 17:17:47.549344 +0300    1493907467  leo_storage_read_repairer:compare/4 165 [{node,'storage_1@192.168.3.54'},{addr_id,192490066507992604465461441302734706270},{key,<<"body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz">>},{clock,1493907421704031},{cause,primary_inconsistency}]

What happened here - it's that "minor load" that I mentioned. Basically at 17:17:13 application tried to do PUT operation of object body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz. That's a very small object, 27 KB in size. Some moments after successful PUT few (5, I believe) other applications did GET for that object. However, they all were using the same gateway with caching enabled, so they should've gotten the object from memory cache (at worst gateway would've checked ETag against storage node). 17:17:13 was in the middle of "delete bucket" operation, so I suppose the fact that there was large "processing time" for PUT was expected. But why "read_repairer" errors and "primary_inconsistency"?? Storage_0 is "primary" node for this object:

[root@leo-m1 ~]# /usr/local/leofs/current/leofs-adm -p 10011 whereis body/08/08/e2/0808e2f9815aa7d4b9c92b01db4fa208344d063d83c151b5349095f4004b60e1684056dab62f9bc2b51ac0e6b3721ca6a807010000000000.xz
-------+-----------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |            node             |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | storage_0@192.168.3.53      | 90d03d01c8e65bffaba62e8fac56165e     |        25K |   dd009e23e7 | false          |              0 | 54eb36e9dcecf  | 2017-05-04 17:17:25 +0300
       | storage_1@192.168.3.54      | 90d03d01c8e65bffaba62e8fac56165e     |        25K |   dd009e23e7 | false          |              0 | 54eb36e9dcecf  | 2017-05-04 17:17:25 +0300
mocchira commented 7 years ago

@vstax Thank for the further testing!

I've found the other problems that cause state inconsistencies to happen between storage(s) and manager(s) and also the problem that cause gen_server to crash frequently on your storage_0 thanks to your report.

The fixes are already out there.

Please try those fixes with the clean state once the above PRs get merged.

I've also confirmed that my assumption from previous comment (#725 (comment)) is correct and counters on storage_1 really went off during that bucket deletion experiment; they were fine on storage_0, however. I rolled back to original state and did compaction on storage_1 to verify counters, and values were the same. That means that there was no error before the experiment, and the error (2035 remaining objects per counters, while in reality there were 0 left) appeared during the course of bucket deletion.

Thanks. I will vet.

Another note: looks like delete-bucket-stats automatically wipes the stats some time after state is "finished" for all nodes (not sure when, 1 or 2 hours maybe?). Which is fine but probably should be mentioned in documentation for that command so that no one gets confused?

Thanks for pointing that out, we will mention about it in doc.

vstax commented 7 years ago

@mocchira Thank you for the fixes. They work; I don't get more crashes like that in logs and the delete-bucket state changes reliably, at least as long as I don't touch anything (I haven't got to that yet).

I didn't encounter problem with du showing incorrect amounts of objects left in the last few experiments. That said, I've found something strange. As I tried to delete two buckets at once ("body" and "bodytest"), I got this:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|grep delete
 leo_delete_dir_queue_1         |   running   | 700645         | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 1094298        | 1600           | 500            | deletion bucket #2

Each node only has ~1.5M objects, each bucket roughly half of that. Normally the number in these queues never goes higher than 650-700K. However, when deleting two bucket at once, each number goes to much higher values. I've seen 1344334, for example. In the later stages, sum of messages in dir_queue_1 and dir_queue_2 can easily be twice of the number of active objects on the node or more than total amount of objects:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_1@192.168.3.54|grep delete|head -n 2
 leo_delete_dir_queue_1         |   running   | 3007           | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 670879         | 1600           | 500            | deletion bucket #2
[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_1@192.168.3.54
 active number of objects: 332564
  total number of objects: 1509955
   active size of objects: 38558613990
    total size of objects: 213421617475
     ratio of active size: 18.07%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53|grep delete
 leo_delete_dir_queue_1         |   running   | 567337         | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 1210768        | 1600           | 500            | deletion bucket #2
[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 1156085
  total number of objects: 1354898
   active size of objects: 157562451801
    total size of objects: 191667623380
     ratio of active size: 82.21%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

Isn't that strange?

(also, I take it that removing >8 buckets at once isn't supported?)

Another strange thing - but it might be just me - is that it feels as if delete bucket state goes from "finished" on all nodes to disappearance faster than before. Like, in minutes? I see 100's thousands of objects in queues, do something else for 20 minutes or so, then check state again and it's empty. And it happened like that a few times. Then again, it might be a side-effect of near instant monitoring->finished state for all nodes after the queue is empty and the time before state is gone is just like before...

mocchira commented 7 years ago

@vstax Thanks for testing!

Thank you for the fixes. They work; I don't get more crashes like that in logs and the delete-bucket state changes reliably, at least as long as I don't touch anything (I haven't got to that yet). I didn't encounter problem with du showing incorrect amounts of objects left in the last few experiments.

Great to hear that.

Isn't that strange?

Yes it's strange and many thanks to report it! It turned out that there is another problem that cause the queue for delete-bucket to make much entries more than necessary (insert_message can be called twice in case the erlang scheduler got overloaded and scheduling to execute insert_message got delayed). I will PR later so please check again once that PR get merged.

(also, I take it that removing >8 buckets at once isn't supported?)

Yes, the default for how many delete-bucket requests can proceed at once is set to 8 and if there are over 8 requests, exceeded ones are pending while all 8 queue is active (ENQUEUING/MONITORING/FINISHED). however there is a hidden param to configure that value at https://github.com/leo-project/leofs/blob/develop/apps/leo_storage/priv/leo_storage.schema#L823-L830 though we are currently considering not to expose this setting to end users as setting the appropriate value for it would be difficult and setting the large value could make overall system unstable, now we are validating the appropriate default (or expose this setting and limit the upper bound).

Another strange thing - but it might be just me - is that it feels as if delete bucket state goes from "finished" on all nodes to disappearance faster than before. Like, in minutes? I see 100's thousands of objects in queues, do something else for 20 minutes or so, then check state again and it's empty. And it happened like that a few times. Then again, it might be a side-effect of near instant monitoring->finished state for all nodes after the queue is empty and the time before state is gone is just like before...

Yes in minutes would be expected.

vstax commented 7 years ago

@mocchira I think I've found another bug. If I restart storage node when status is "enqueuing" on manager, the status instantly changes to "monitoring" when the node goes down; after it's up, it continues to delete objects that were already enqueued, but the rest doesn't get enqueued so after the bucket deletion is done and status is "finished" according to manager, some objects still remain (they are present in AVS files after compaction).

mocchira commented 7 years ago

@vstax thanks for reporting another bug. I still can not reproduce this behavior so please wait for a while to get this bug fixed on develop. Before further digging this problem, I will PR for "delete-bucket making much entries more than necessary" problem at first.

EDIT: I've found the culprit in leo_backend_db where the success can be returned when actually the error happened in eleveldb (included closing file handles in eleveldb when leo_storage goes down). I will send PR later.

EDIT: Please check whether status inconsistency problem with node down fixed once https://github.com/leo-project/leo_backend_db/pull/13 get merged.

mocchira commented 7 years ago

@vstax Please check whether the below behavior never happen once https://github.com/leo-project/leofs/pull/793 get merged.

Normally the number in these queues never goes higher than 650-700K. However, when deleting two bucket at once, each number goes to much higher values.

mocchira commented 7 years ago

@vstax Please check the doc for delete-bucket-stats on https://mocchira.github.io/leofs/admin/system_operations/s3/ if you can spare time in order to confirm your suggestions like "how long delete-bucket-stats records exist once the every state transit into finished" have been covered.

vstax commented 7 years ago

@mocchira Thank you for your support.

Inconsistency problem is gone, I think (it's a bit hard to say how likely it was to encounter it). The number of messages in queue: still a problem.

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|grep delete|head -n 2
 leo_delete_dir_queue_1         |   idling    | 117542         | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 139446         | 1600           | 500            | deletion bucket #2
[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_2@192.168.3.55
 active number of objects: 959706
  total number of objects: 1493689
   active size of objects: 122045873246
    total size of objects: 210052283999
     ratio of active size: 58.1%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_1@192.168.3.54|grep delete|head -n 2
 leo_delete_dir_queue_1         |   running   | 444093         | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 1173563        | 1600           | 500            | deletion bucket #2
 active number of objects: 930649
  total number of objects: 1509955
   active size of objects: 116640828817
    total size of objects: 213421617475
     ratio of active size: 54.65%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53|grep delete|head -n 2
 leo_delete_dir_queue_1         |   running   | 386793         | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 912661         | 1600           | 500            | deletion bucket #2
[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 847695
  total number of objects: 1354898
   active size of objects: 107321761411
    total size of objects: 191667623380
     ratio of active size: 55.99%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

[root@leo-m0 ~]# /usr/local/bin/leofs-adm delete-bucket-stats
- Bucket: body
node                         | state            | timestamp                   
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | enqueuing        | 2017-07-27 21:48:35 +0300               
storage_1@192.168.3.54       | monitoring       | 2017-07-27 21:57:49 +0300               
storage_0@192.168.3.53       | enqueuing        | 2017-07-27 21:48:35 +0300               

- Bucket: bodytest
node                         | state            | timestamp                   
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | enqueuing        | 2017-07-27 21:48:32 +0300               
storage_1@192.168.3.54       | monitoring       | 2017-07-27 21:54:27 +0300               
storage_0@192.168.3.53       | monitoring       | 2017-07-27 21:53:37 +0300               

Please note that I did restart this and other nodes in the middle of enqueue operation. Well, the situation is better, so to say: before it always used to happen and now it didn't happen on storage_2 - only on storage_0 and storage_1.

What's worse, the states stuck again.. it's 22:28 now and delete-bucket-stats output is still the same. For example, it says "enqueuing" for storage_2 for both buckets but it's not enqueuing anything. I think there are still problems if node is restarted in the middle of "enqueuing" operation.

There are lots of errors in logs around restarts (which is typical for restarting under load, though). I uploaded storage node logs (there is nothing in manager's logs) to https://www.dropbox.com/s/wzmzl5yprqzadks/storage-logs.tar.gz?dl=0 As you can see from info logs, I restarted storage_2 first, then storage_0, then storage_1. I waited for previous node to fully start before stopping the next one. I should note that storage_2 took a long time to stop and very long time to start, while storage_0 and storage_1 stopped and started very fast. Also, I started managers during these delete bucket operations (ie I did it with only slave present, then started master), but the state switched from pending to enqueuing just fine regardless, so it's probably unrelated.

EDIT: eventually it got to this state:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55|head -n 6
              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description                 
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_async_deletion_queue       |   idling    | 31587          | 1600           | 500            | async deletion of objs                      
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 1600           | 500            | compare metadata w/remote-node              
 leo_delete_dir_queue_1         |   idling    | 3              | 1600           | 500            | deletion bucket #1                          
 leo_delete_dir_queue_2         |   idling    | 26             | 1600           | 500            | deletion bucket #2                          

it's not changing at all. There are no more errors or anything in logs since the time I gathered logs that are posted above. Also, the state correctly switched from monitoring to finished (storage_1 still has >500K of messages in queue so it'll probably switch later):

[root@leo-m0 ~]# /usr/local/bin/leofs-adm delete-bucket-stats
- Bucket: body
node                         | state            | timestamp                   
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | enqueuing        | 2017-07-27 21:48:35 +0300               
storage_1@192.168.3.54       | monitoring       | 2017-07-27 21:57:49 +0300               
storage_0@192.168.3.53       | enqueuing        | 2017-07-27 21:48:35 +0300               

- Bucket: bodytest
node                         | state            | timestamp                   
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | enqueuing        | 2017-07-27 21:48:32 +0300               
storage_1@192.168.3.54       | finished         | 2017-07-27 22:31:47 +0300               
storage_0@192.168.3.53       | finished         | 2017-07-27 22:31:12 +0300               

but enqueuing -> monitoring is still a problem.

mocchira commented 7 years ago

@vstax Thanks for the further testing.

In the case of restarting while delete-bucket-stats is in enqueuing, I'd expect https://github.com/leo-project/leofs/blob/develop/apps/leo_storage/src/leo_storage_handler_del_directory.erl#L328-L330 those lines to get executed however it seems not to happen according to the log files you would share. (It happens on my devbox anyway) Also one more important thing I noticed is there were some errors that represent failing to insert/update/delete records on leveldb due to badarg (that means file handles opened by eleveldb already got closed). That said, a persistent state without being protected by reliable retry mechanism can be lost (state inconsistency on delete-bucket-stats could happen).

Taking those facts into account, we have to find the reliable way

I will get in touch once we get to reach the consensus on how to solve the above problems and the code with the solution get merged.

mocchira commented 7 years ago

Problems to be solved.

yosukehara commented 7 years ago

@mocchira I'm going to investigate closing/terminating leo_storage in the case of irregular situations.

mocchira commented 7 years ago

@vstax As frozen enqueuing problem should be gone with rc6, please try again.

vstax commented 7 years ago

@mocchira @yosukehara Thank you for support.

First of all: the bug with "too many messages" in queues when deleting two buckets at once is still present (this is before any restarts):

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53 | grep delet
 leo_async_deletion_queue       |   idling    | 25             | 1600           | 500            | async deletion of objs
 leo_delete_dir_queue_1         |   running   | 581823         | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 906642         | 1600           | 500            | deletion bucket #2

Regarding enqueuing state bug: I think there are some problems (plus there is strange enqueuing-> monitoring state switch). Stopping storage_0 now. It took some time but eventually it stopped:

[E] manager_0@192.168.3.50  2017-08-02 21:10:47.6002 +0300  1501697447  leo_manager_cluster_monitor:handle_call - DOWN269   [{node,'storage_0@192.168.3.53'}]

[root@leo-m0 ~]# /usr/local/bin/leofs-adm status|grep storage_0
  S    | storage_0@192.168.3.53      | stop         |                |                | 2017-08-02 21:10:47 +0300

Checking deletion status - still "enqueuing" (I though it's supposed to roll back to "pending"?)

[root@leo-m0 ~]# /usr/local/bin/leofs-adm delete-bucket-stats
- Bucket: body
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | enqueuing        | 2017-08-02 21:04:27 +0300
storage_1@192.168.3.54       | enqueuing        | 2017-08-02 21:04:27 +0300
storage_0@192.168.3.53       | enqueuing        | 2017-08-02 21:04:27 +0300

- Bucket: bodytest
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | monitoring       | 2017-08-02 21:10:58 +0300
storage_1@192.168.3.54       | monitoring       | 2017-08-02 21:10:52 +0300
storage_0@192.168.3.53       | monitoring       | 2017-08-02 21:10:11 +0300

Here I wasn't doing anything, just looking at logs and such for a minute or two, then the status switched to "monitoring" even though storage_0 was down all this time. EDIT: I made a mistake originally, the state for storage_0 and bucket "body" is still the same.

[root@leo-m0 ~]# /usr/local/bin/leofs-adm delete-bucket-stats
- Bucket: body
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | monitoring       | 2017-08-02 21:12:03 +0300
storage_1@192.168.3.54       | monitoring       | 2017-08-02 21:12:00 +0300
storage_0@192.168.3.53       | enqueuing        | 2017-08-02 21:04:27 +0300

- Bucket: bodytest
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | monitoring       | 2017-08-02 21:10:58 +0300
storage_1@192.168.3.54       | monitoring       | 2017-08-02 21:10:52 +0300
storage_0@192.168.3.53       | monitoring       | 2017-08-02 21:10:11 +0300

Starting storage_0:

[root@leo-m0 app]# /usr/local/bin/leofs-adm status|grep storage_0
  S    | storage_0@192.168.3.53      | running      | c1d863d0       | c1d863d0       | 2017-08-02 21:14:00 +0300

status is still the same:

[root@leo-m0 app]# /usr/local/bin/leofs-adm delete-bucket-stats
- Bucket: body
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | monitoring       | 2017-08-02 21:12:03 +0300
storage_1@192.168.3.54       | monitoring       | 2017-08-02 21:12:00 +0300
storage_0@192.168.3.53       | enqueuing        | 2017-08-02 21:14:06 +0300

- Bucket: bodytest
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | monitoring       | 2017-08-02 21:10:58 +0300
storage_1@192.168.3.54       | monitoring       | 2017-08-02 21:10:52 +0300
storage_0@192.168.3.53       | monitoring       | 2017-08-02 21:10:11 +0300

Log files from storage_0 only (there is nothing else on manager, and I didn't do anything to other nodes) are at https://www.dropbox.com/s/q6sza116pl43s45/s0-logs.tar.gz?dl=0

Additionally, version of leo_storage and gateway is still 1.3.4 in latest RC:

[root@leo-m0 app]# /usr/local/bin/leofs-adm version all
 [Version of Node(s)]
-------+-----------------------------+----------------------------------+
 type  |            node             |             version              |
-------+-----------------------------+----------------------------------+
  M    | manager_0@192.168.3.50      | 1.3.5                            |
  M    | manager_1@192.168.3.51      | 1.3.5                            |
  S    | storage_0@192.168.3.53      | 1.3.4                            |
  S    | storage_1@192.168.3.54      | 1.3.4                            |
  S    | storage_2@192.168.3.55      | 1.3.4                            |
  G    | gateway_0@192.168.3.52      | 1.3.4                            |
-------+-----------------------------+----------------------------------+

EDIT: end result, an hour later:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm delete-bucket-stats
- Bucket: body
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_2@192.168.3.55       | finished         | 2017-08-02 22:03:13 +0300
storage_1@192.168.3.54       | finished         | 2017-08-02 22:03:14 +0300
storage_0@192.168.3.53       | monitoring       | 2017-08-02 21:22:10 +0300

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53|grep del|head -n 3
 leo_async_deletion_queue       |   running   | 35             | 1600           | 500            | async deletion of objs
 leo_delete_dir_queue_1         |   idling    | 0              | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   idling    | 932874         | 1600           | 500            | deletion bucket #2

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53|grep del|head -n 3
 leo_async_deletion_queue       |   idling    | 35             | 1600           | 500            | async deletion of objs
 leo_delete_dir_queue_1         |   idling    | 0              | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 932874         | 1600           | 500            | deletion bucket #2

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53|grep del|head -n 3
 leo_async_deletion_queue       |   idling    | 35             | 1600           | 500            | async deletion of objs
 leo_delete_dir_queue_1         |   idling    | 0              | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   idling    | 932874         | 1600           | 500            | deletion bucket #2

Queue is switching between running and idling, not processing, nothing in logs of storage_0 at all. There are some spikes in CPU load for this node. leo_doctor output: https://pastebin.com/FyJ9FAQg

yosukehara commented 7 years ago

@vstax Thanks for your contribution. We're going to test removing directory/bucket to find the root cause of this situation and then fix those problems.

Checking deletion status - still "enqueuing" (I thought it's supposed to roll back to "pending"?)

I've realized LeoStorage does not send a rollback message to LeoManager nodes. I'm going to fix it today.

Additionally, version of leo_storage and gateway is still 1.3.4 in latest RC:

Those version depend on as follows:

I'll also fix them on the next commit.

vstax commented 7 years ago

@yosukehara Thank you.

I'm sorry for the confusion, I made a mistake in previous comment, saying that state at manager changed while the node was down; it did not, it was still "enqueuing". What actually happened is that after I was looking at output of "status" and "version all" (which both list nodes in order storage_0 - storage_1 - storage_2) I looked just at first line of output of "delete-bucket-stats", which was for storage_2. In other words, "delete-bucket-stats" seems to always list nodes in reverse order compared to output of other commands. Would it be possible to make the order the same as others to avoid the confusion?

yosukehara commented 7 years ago

@vstax Thanks for your comment. I've understood. We've been still investigate it to deliver its proper behavior.

Checking deletion status - still "enqueuing" (I thought it's supposed to roll back to "pending"?)

I've realized LeoStorage does not send a rollback message to LeoManager nodes. I'm going to fix it today.

I misunderstood that. I closed its PR yesterday.

yosukehara commented 7 years ago

@vstax I've fixed display order of del-bucket-stats and also fixed the problem of not finishing message consumption. If you have time, you can test the latest development version with the develop branch.

vstax commented 7 years ago

@yosukehara Thank you.

The order is fixed, but I don't quite understand what problem you are talking about - frozen queue (running->idling->running) on storage node after restart?

I did experiment on the latest version. After I stopped node, the state was still "enqueuing" at manager. After I started node, it was still the same. The existing queues on storage node were consuming fine after starting. At some point state switched to "monitoring", then as queue was consumed fully to "finished", bucket deletion was done, but lots of objects (ones that didn't go into queue yet at the point when I stopped the node) remained on the node (all objects were supposed to be deleted):

[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 297758
  total number of objects: 1354898
   active size of objects: 37049470168
    total size of objects: 191667623380
     ratio of active size: 19.33%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

I uploaded storage_0 logs to https://www.dropbox.com/s/qa2ugo52cagqtuh/s0-logs-201708004.tar.gz?dl=0 I stopped storage_0 at 21:08:50 (it took some time to stop, though). I stopped storage_1 at 21:24, but it looks like all objects were actually enqueued there at this point so there were no problems with it (everything was deleted in the end).

The problems of much higher (twice or so) amounts of messages in each deletion queue when deleting two buckets at once is still present, also the state is still "enqueuing" when node stops.

Let me summarize the problems related to bucket deletion that I know. For case when storage nodes are fully working there are two bugs:

  1. Much higher amount of messages during deletion of two buckets at once compared to deleting them one after another. There are details in https://github.com/leo-project/leofs/issues/725#issuecomment-317537852; fix from #793 did not help.
  2. I think that "enqueuing" temporarily gets kind-of-stuck, especially visible during the same experiment (deleting two similar-sized large buckets at once), for one bucket state is switched to "monitoring" very fast, but for another it gets stuck at "enqueuing" for hour or so, even though the queue growth stops for both buckets at about the same time. Only when both buckets are nearly deleted on storage nodes, "enqueuing" changes to "monitoring" for second bucket as well. Both buckets switch to "finished" state at around the same time, it's only "enqueuing->monitoring" switch which seem to happen much later than you'd think it should. But I cannot say if this really is a bug or just behaves this way for some reason, either way, it doesn't lead to consistency problems.

For case when storage node is gracefully stopped during enqueuing stage:

  1. The state at manager is stuck at "enqueuing" (need to confirm this one again)
  2. Not all objects from the bucket are deleted on storage node in the end (confirmed)
  3. It's possible for queues to get stuck on storage node after it's started, it consumes some messages but the numbers aren't at 0 in the end (need to confirm this one again)

For case when network connectivity with node has been lost (and connections break) and case of ungraceful shutdown (server reboot, OOM kill): not testing this yet.

Unfortunately, for RL reasons I won't be able to help with testing anything in the next 2 weeks. I think currently bucket deletion is working (or almost-working, first two bugs are still present but they don't cause any consistency problems) for the cases when nothing unusual happens to storage nodes. Maybe leaving the problems caused by stopping storage nodes for next release, if 2 weeks delay + who knows much much time later to fix it all is unacceptable delay for 1.3.5? I'd really like to help with testing this functionality so it has no known bugs, we can return to this later.

yosukehara commented 7 years ago

@vstax Thanks for testing the latest development version and sharing informative report.

Let me summarize the problems related to bucket deletion that I know. For case when storage nodes are fully working there are two bugs

We're going to investigate those issues and other problems from today.

mocchira commented 7 years ago

@vstax Thanks for sorting the remained problems out.

Much higher amount of messages during deletion of two buckets at once compared to deleting them one after another. There are details in #725 (comment); fix from #793 did not help.

Unfortunately we still couldn't reproduce this issue as of now. however it should not be critical one so that we will file this issue into 1.4 milestones. EDIT: filed on #803.

I think that "enqueuing" temporarily gets kind-of-stuck, especially visible during the same experiment (deleting two similar-sized large buckets at once), for one bucket state is switched to "monitoring" very fast, but for another it gets stuck at "enqueuing" for hour or so, even though the queue growth stops for both buckets at about the same time. Only when both buckets are nearly deleted on storage nodes, "enqueuing" changes to "monitoring" for second bucket as well. Both buckets switch to "finished" state at around the same time, it's only "enqueuing->monitoring" switch which seem to happen much later than you'd think it should. But I cannot say if this really is a bug or just behaves this way for some reason, either way, it doesn't lead to consistency problems.

Same status as above. we will file this issue into 1.4 milestones. EDIT: filed on #804.

The state at manager is stuck at "enqueuing" (need to confirm this one again)

Through some discussions with other members, we've decided it's spec as below.

In case leo_storage goes down while it's enqueuing, The state of the corresponding leo_storage at manager keep being enqueuing (Not changing to pending as it will become enqueuing immediately once leo_storage get back)

Let us know if this behavior troubles you.

Not all objects from the bucket are deleted on storage node in the end (confirmed)

The below 2 commits should fix this problem so please give it a try with 1.3.5 after you will be back.

It's possible for queues to get stuck on storage node after it's started, it consumes some messages but the numbers aren't at 0 in the end (need to confirm this one again)

The below commit should fix this problem so please give it a try with 1.3.5 after you will be back.

Unfortunately, for RL reasons I won't be able to help with testing anything in the next 2 weeks. I think currently bucket deletion is working (or almost-working, first two bugs are still present but they don't cause any consistency problems) for the cases when nothing unusual happens to storage nodes. Maybe leaving the problems caused by stopping storage nodes for next release, if 2 weeks delay + who knows much much time later to fix it all is unacceptable delay for 1.3.5? I'd really like to help with testing this functionality so it has no known bugs, we can return to this later.

We will release 1.3.5 today as the problems caused by stopping storage nodes have been fixed now. Let us know if there is something that doesn't work for you with 1.3.5 except the problems we assigned its milestone into 1.4.

vstax commented 7 years ago

@mocchira Thank you, I'll test these fixes when I'll be able to. Regarding being unable to reproduce issues when deleting two buckets at once, I just remembered that one of these buckets ("body", it's the one which switches from enqueuing to monitoring much later as well) has all of its objects with "old metadata" format. I was able to delete it only since #754 was implemented. Could these two issues be related to that?

mocchira commented 7 years ago

@vstax

Regarding being unable to reproduce issues when deleting two buckets at once, I just remembered that one of these buckets ("body", it's the one which switches from enqueuing to monitoring much later as well) has all of its objects with "old metadata" format. I was able to delete it only since #754 was implemented. Could these two issues be related to that?

Maybe. We will try to reproduce with the one bucket having "old metadata" format later. Thanks!

mocchira commented 7 years ago

Progress Note: we have not succeeded in reproducing issues yet even with the one bucket having the old metadata format. we will look into the other factors that might affect the reproducibility.

vstax commented 7 years ago

@mocchira

Progress Note: we have not succeeded in reproducing issues yet even with the one bucket having the old metadata format. we will look into the other factors that might affect the reproducibility.

Thank you. There are lots of timeout errors in my log files (because there are lots of objects to delete, I suppose), I wonder if you're getting similar amounts in your tests, because they might be a factor here?..

Either way, I'll be able to return to this around the end of week, I was thinking of doing 3 experiments on the latest version (delete bucket "body", then "bodytest" one by one; then the same, but in reverse order; then both at once) while gathering output of delete-bucket-stats and mq-stats for all nodes every minute or 30 seconds, to closely monitor & time moments of state switching and numbers in queues to see if there really is anything abnormal. I could add some other stats gathering here but not sure what else could help (debug logs don't output anything relevant, at least).

mocchira commented 7 years ago

@vstax Thanks for replying.

Thank you. There are lots of timeout errors in my log files (because there are lots of objects to delete, I suppose), I wonder if you're getting similar amounts in your tests, because they might be a factor here?..

Yes we also have non-negligible timeouts like you due to https://github.com/leo-project/leofs/issues/764 however the actual number might be (very) different from yours. I will look into the problem while considering how lots of timeouts affect the behavior of multiple delete-bucket at once.

Either way, I'll be able to return to this around the end of week, I was thinking of doing 3 experiments on the latest version (delete bucket "body", then "bodytest" one by one; then the same, but in reverse order; then both at once) while gathering output of delete-bucket-stats and mq-stats for all nodes every minute or 30 seconds, to closely monitor & time moments of state switching and numbers in queues to see if there really is anything abnormal. I could add some other stats gathering here but not sure what else could help (debug logs don't output anything relevant, at least).

Thanks! looking forward to seeing your results.

vstax commented 7 years ago

@mocchira WIP, however I can see now that that I can get too many messages during deleting of a single bucket as well. It's not caused by deletion of two bucket at once, but by some other means, it seems. Long time ago I never got such numbers when deleting single bucket, so it must be either some changes between delete-bucket implementation and latest changes, or some third factor that didn't exist back then but plays role now.

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53
              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description                 
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_async_deletion_queue       |   running   | 8              | 1600           | 500            | async deletion of objs                      
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 1600           | 500            | compare metadata w/remote-node              
 leo_delete_dir_queue_1         |   running   | 1276860        | 1600           | 500            | deletion bucket #1                          
 leo_delete_dir_queue_2         |   idling    | 0              | 1600           | 500            | deletion bucket #2                          
 leo_delete_dir_queue_3         |   idling    | 0              | 1600           | 500            | deletion bucket #3                          
 leo_delete_dir_queue_4         |   idling    | 0              | 1600           | 500            | deletion bucket #4                          
 leo_delete_dir_queue_5         |   idling    | 0              | 1600           | 500            | deletion bucket #5                          
 leo_delete_dir_queue_6         |   idling    | 0              | 1600           | 500            | deletion bucket #6                          
 leo_delete_dir_queue_7         |   idling    | 0              | 1600           | 500            | deletion bucket #7                          
 leo_delete_dir_queue_8         |   idling    | 0              | 1600           | 500            | deletion bucket #8                          
 leo_per_object_queue           |   idling    | 0              | 1600           | 500            | recover inconsistent objs                   
 leo_rebalance_queue            |   idling    | 0              | 1600           | 500            | rebalance objs                              
 leo_recovery_node_queue        |   idling    | 0              | 1600           | 500            | recovery objs of node                       
 leo_req_delete_dir_queue       |   idling    | 0              | 1600           | 500            | request removing directories                
 leo_sync_by_vnode_id_queue     |   idling    | 0              | 1600           | 500            | sync objs by vnode-id                       
 leo_sync_obj_with_dc_queue     |   idling    | 0              | 1600           | 500            | sync objs w/remote-node                     

[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 1249230
  total number of objects: 1354898
   active size of objects: 174361023360
    total size of objects: 191667623380
     ratio of active size: 90.97%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

Only deleting single bucket here, there are two buckets with about equal amount of messages so I don't expect numbers in queue to go higher than 1354898 / 2 ~= 677000.

EDIT OK this looks totally wrong. After deleting "body" bucket is completed - "bodytest" was never touched!:

[root@leo-m0 shm]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 0
  total number of objects: 1354898
   active size of objects: 0
    total size of objects: 191667623380
     ratio of active size: 0.0%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

Deleting bucket "body" completely removed all objects both from "body" and "bodytest". How is that possible? If that really is the case, it would explain both problems (removing "body" takes much more time than "bodytest", and too many messages in queues when deleting both "body" and "bodytest" at once). Just in case, both buckets are owned by the same user "body". Right now bucket "bodytest" still exists, but all objects from it on nodes are removed.

mocchira commented 7 years ago

Deleting bucket "body" completely removed all objects both from "body" and "bodytest". How is that possible? If that really is the case, it would explain both problems (removing "body" takes much more time than "bodytest", and too many messages in queues when deleting both "body" and "bodytest" at once). Just in case, both buckets are owned by the same user "body". Right now bucket "bodytest" still exists, but all objects from it on nodes are removed.

@vstax Oops. Thanks that totally make sense! we will fix ASAP.

yosukehara commented 7 years ago

@mocchira We need to fix this issue quickly, then will release v1.3.6 which contains it.

mocchira commented 7 years ago

@vstax https://github.com/leo-project/leofs/pull/808 should work for you so please give it a try.

vstax commented 7 years ago

@mocchira Thank you, this fix works.

I did tests with deleting buckets one by one and both at once and can confirm that the issues #803 and #804 are now gone and can be closed.

I can see two minor problems still. Even during deletion of a single bucket, the number of messages in queue at its peak still gets somewhat higher than amount of remaining objects. E.g.

[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 511330
  total number of objects: 1354898
   active size of objects: 67839608148
    total size of objects: 191667623380
     ratio of active size: 35.39%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53
              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_async_deletion_queue       |   idling    | 1              | 1600           | 500            | async deletion of objs
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 1600           | 500            | compare metadata w/remote-node
 leo_delete_dir_queue_1         |   running   | 549813         | 1600           | 500            | deletion bucket #1

The numbers eventually get closer and closer (and both finish at 0, of course):

[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 156991
  total number of objects: 1354898
   active size of objects: 16764286563
    total size of objects: 191667623380
     ratio of active size: 8.75%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53
              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_async_deletion_queue       |   idling    | 0              | 1600           | 500            | async deletion of objs
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 1600           | 500            | compare metadata w/remote-node
 leo_delete_dir_queue_1         |   running   | 157986         | 1600           | 500            | deletion bucket #1

The second problem (even more minor) is premature clearing of delete-bucket-state. It definitely isn't available for "minutes" after bucket deletion is done. Here are few examples, I was executing delete-bucket-state every 30 seconds and catching it in "finished" state is nearly impossible - the state disappears almost right away as soon as all queues get to 0:

16:40:45    {"de_bucket_state":[{"node":"storage_0@192.168.3.53","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:20:10 +0300"},{"node":"storage_1@192.168.3.54","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:23:27 +0300"},{"node":"storage_2@192.168.3.55","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:21:23 +0300"}]}
16:41:15    {"de_bucket_state":[{"node":"storage_0@192.168.3.53","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:20:10 +0300"},{"node":"storage_1@192.168.3.54","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:23:27 +0300"},{"node":"storage_2@192.168.3.55","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:21:23 +0300"}]}
16:41:45    {"error":"Delete-bucket's stats not found"}
17:17:45    {"de_bucket_state":[{"node":"storage_0@192.168.3.53","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:52:56 +0300"},{"node":"storage_1@192.168.3.54","state":"3","state_str":"monitoring","timestamp":"2017-08-25 17:03:54 +0300"},{"node":"storage_2@192.168.3.55","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:59:44 +0300"}]}
17:18:15    {"de_bucket_state":[{"node":"storage_0@192.168.3.53","state":"3","state_str":"monitoring","timestamp":"2017-08-25 16:52:56 +0300"},{"node":"storage_1@192.168.3.54","state":"3","state_str":"monitoring","timestamp":"2017-08-25 17:03:54 +0300"},{"node":"storage_2@192.168.3.55","state":"9","state_str":"finished","timestamp":"2017-08-25 17:18:15 +0300"}]}
17:18:45    {"error":"Delete-bucket's stats not found"}

The next spreadsheets show combination of mq-stats and delete-bucket-stats. "none" in bucket state means that delete stats are "not found".

Experiment with deleting body bucket first, then bodytest Experiment with deleting body and bodytest at once

As for the third experiment, "bodytest", then "body" - I didn't finish it, I checked and it went exactly like "body, then bodytest" one.

Initial state of nodes, and the state after "body" bucket was deleted in the first experiment, storage_0:

 active number of objects: 1348208
  total number of objects: 1354898
   active size of objects: 191655130794
    total size of objects: 191667623380
     ratio of active size: 99.99%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

 active number of objects: 674344
  total number of objects: 1354898
   active size of objects: 95969028848
    total size of objects: 191667623380
     ratio of active size: 50.07%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

storage_1:

 active number of objects: 1502466
  total number of objects: 1509955
   active size of objects: 213408666322
    total size of objects: 213421617475
     ratio of active size: 99.99%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

 active number of objects: 750866
  total number of objects: 1509955
   active size of objects: 106709199844
    total size of objects: 213421617475
     ratio of active size: 50.0%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

storage_2:

 active number of objects: 1486212
  total number of objects: 1493689
   active size of objects: 210039336410
    total size of objects: 210052283999
     ratio of active size: 99.99%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

 active number of objects: 743230
  total number of objects: 1493689
   active size of objects: 104871681352
    total size of objects: 210052283999
     ratio of active size: 49.93%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

At the end all objects were removed from all nodes. Buckets "body" and "bodytest" are almost the same ("bodytest" might contain a bit more objects than "body", but the difference should be less than 1%).

To summarize the experiment result: deleting first bucket took 26 minutes, second (after pause for a few minutes) - 34 minutes. The queues on storage_1 and storage_2 were filling ("enqueuing") significantly slower for the second bucket, but processing them ("monitoring") went at about the same speed. Deleting both buckets at once took 39 minutes and the performance was identical for both buckets (however, queue filling for second bucket lagged for about a minute, despite executing both deletes at about the same time). I don't think there are any issues here.

vstax commented 7 years ago

I've checked what happens if node is stopped during bucket deletion and the problem with objects remaining is still present.

First, I've executed "delete-bucket body" and "delete-bucket bodytest" A minute or so later, when the queues started to fill I stopped storage_0:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_0@192.168.3.53
              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_async_deletion_queue       |   idling    | 0              | 1600           | 500            | async deletion of objs
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 1600           | 500            | compare metadata w/remote-node
 leo_delete_dir_queue_1         |   idling    | 52380          | 1600           | 500            | deletion bucket #1

Some time later, when all other nodes switched state to "monitoring":

[root@leo-m0 ~]# /usr/local/bin/leofs-adm delete-bucket-stats
- Bucket: body
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_0@192.168.3.53       | enqueuing        | 2017-08-25 19:44:26 +0300
storage_1@192.168.3.54       | monitoring       | 2017-08-25 19:55:26 +0300
storage_2@192.168.3.55       | monitoring       | 2017-08-25 19:54:18 +0300

- Bucket: bodytest
node                         | state            | timestamp
-----------------------------+------------------+-----------------------------
storage_0@192.168.3.53       | enqueuing        | 2017-08-25 19:43:26 +0300
storage_1@192.168.3.54       | monitoring       | 2017-08-25 19:53:21 +0300
storage_2@192.168.3.55       | monitoring       | 2017-08-25 19:52:38 +0300

[root@leo-m0 ~]# /usr/local/bin/leofs-adm mq-stats storage_2@192.168.3.55
              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_async_deletion_queue       |   running   | 216616         | 1600           | 500            | async deletion of objs
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 1600           | 500            | compare metadata w/remote-node
 leo_delete_dir_queue_1         |   running   | 512792         | 1600           | 500            | deletion bucket #1
 leo_delete_dir_queue_2         |   running   | 501945         | 1600           | 500            | deletion bucket #2

I started storage_0.

Eventually all queues on all nodes got to 0, bucket deletion is finished:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm delete-bucket-stats
[ERROR] Delete-bucket's stats not found

But objects remain on storage_0:

[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_0@192.168.3.53
 active number of objects: 2574
  total number of objects: 1354898
   active size of objects: 9810347778
    total size of objects: 191667623380
     ratio of active size: 5.12%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

(reverse-i-search)`du': ^C -sm *
[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_1@192.168.3.54
 active number of objects: 0
  total number of objects: 1509955
   active size of objects: 0
    total size of objects: 213421617475
     ratio of active size: 0.0%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

[root@leo-m0 ~]# /usr/local/bin/leofs-adm du storage_2@192.168.3.55
 active number of objects: 0
  total number of objects: 1493689
   active size of objects: 0
    total size of objects: 210052283999
     ratio of active size: 0.0%
    last compaction start: ____-__-__ __:__:__
      last compaction end: ____-__-__ __:__:__

Also: info log on manager_0 is missing "dequeue" for "body" bucket, there is one only for "bodytest":

[I] manager_0@192.168.3.50  2017-08-25 19:43:14.883159 +0300    1503679394  leo_manager_del_bucket_handler:handle_call/3 - enqueue  134 [{"bucket_name",<<"body">>},{"node",'storage_0@192.168.3.53'}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:14.883463 +0300    1503679394  leo_manager_del_bucket_handler:handle_call/3 - enqueue  134 [{"bucket_name",<<"body">>},{"node",'storage_1@192.168.3.54'}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:14.883648 +0300    1503679394  leo_manager_del_bucket_handler:handle_call/3 - enqueue  134 [{"bucket_name",<<"body">>},{"node",'storage_2@192.168.3.55'}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:17.745669 +0300    1503679397  leo_manager_del_bucket_handler:handle_call/3 - enqueue  134 [{"bucket_name",<<"bodytest">>},{"node",'storage_0@192.168.3.53'}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:17.746028 +0300    1503679397  leo_manager_del_bucket_handler:handle_call/3 - enqueue  134 [{"bucket_name",<<"bodytest">>},{"node",'storage_1@192.168.3.54'}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:17.746231 +0300    1503679397  leo_manager_del_bucket_handler:handle_call/3 - enqueue  134 [{"bucket_name",<<"bodytest">>},{"node",'storage_2@192.168.3.55'}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:24.855532 +0300    1503679404  leo_manager_del_bucket_handler:notify_fun/3 280 [{"node",'storage_0@192.168.3.53'},{"bucket_name",<<"body">>}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:24.858732 +0300    1503679404  leo_manager_del_bucket_handler:notify_fun/3 280 [{"node",'storage_1@192.168.3.54'},{"bucket_name",<<"body">>}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:24.865141 +0300    1503679404  leo_manager_del_bucket_handler:notify_fun/3 280 [{"node",'storage_2@192.168.3.55'},{"bucket_name",<<"body">>}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:24.868522 +0300    1503679404  leo_manager_del_bucket_handler:notify_fun/3 280 [{"node",'storage_0@192.168.3.53'},{"bucket_name",<<"bodytest">>}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:24.871346 +0300    1503679404  leo_manager_del_bucket_handler:notify_fun/3 280 [{"node",'storage_1@192.168.3.54'},{"bucket_name",<<"bodytest">>}]
[I] manager_0@192.168.3.50  2017-08-25 19:43:24.874775 +0300    1503679404  leo_manager_del_bucket_handler:notify_fun/3 280 [{"node",'storage_2@192.168.3.55'},{"bucket_name",<<"bodytest">>}]
[I] manager_0@192.168.3.50  2017-08-25 20:15:12.982604 +0300    1503681312  leo_manager_api:brutal_synchronize_ring_1/2 1944    node:'storage_0@192.168.3.53'
[I] manager_0@192.168.3.50  2017-08-25 20:49:30.316532 +0300    1503683370  leo_manager_del_bucket_handler:after_completion/1   308 [{"msg: dequeued and removed",<<"bodytest">>}]

Despite all three nodes having "dequeue" in logs for both buckets:

[I] storage_0@192.168.3.53  2017-08-25 20:30:32.142338 +0300    1503682232  leo_storage_handler_del_directory:run/5661  [{"msg: dequeued and removed (bucket)",<<"body">>}]
[I] storage_0@192.168.3.53  2017-08-25 20:31:29.699986 +0300    1503682289  leo_storage_handler_del_directory:run/5661  [{"msg: dequeued and removed (bucket)",<<"bodytest">>}]

[I] storage_1@192.168.3.54  2017-08-25 20:49:11.193615 +0300    1503683351  leo_storage_handler_del_directory:run/5661  [{"msg: dequeued and removed (bucket)",<<"body">>}]
[I] storage_1@192.168.3.54  2017-08-25 20:49:20.183256 +0300    1503683360  leo_storage_handler_del_directory:run/5661  [{"msg: dequeued and removed (bucket)",<<"bodytest">>}]

[I] storage_2@192.168.3.55  2017-08-25 20:49:10.91084 +0300 1503683350  leo_storage_handler_del_directory:run/5 661 [{"msg: dequeued and removed (bucket)",<<"body">>}]
[I] storage_2@192.168.3.55  2017-08-25 20:49:19.238982 +0300    1503683359  leo_storage_handler_del_directory:run/5661  [{"msg: dequeued and removed (bucket)",<<"bodytest">>}]

I executed compaction on storage_0 and lots of objects indeed remain, there are objects both from "bodytest" and "body" present on storage_0 (in about equal amount for each). I uploaded logs from all storage nodes to https://www.dropbox.com/s/gisr1ujoipu9tdz/storage-logs-objects-remaining.tar.gz?dl=0

I've noticed a (maybe interesting) thing as well, when looking at numbers "active number of objects: 2574, active size of objects: 9810347778" and "total number of objects: 1354898, total size of objects: 191667623380". Average object size for the node is 191667623380 / 1354898 ~= 140 KB. However, for the objects that weren't deleted the average size is 9810347778 / 2574 ~= 3.8 MB. Isn't that a bit strange? ("active number / size" numbers can be trusted, they didn't change after compaction as well).

mocchira commented 7 years ago

@vstax thanks for confirming the fix and reporting in detail as always.

I did tests with deleting buckets one by one and both at once and can confirm that the issues #803 and #804 are now gone and can be closed.

OK. I will close later.

Even during deletion of a single bucket, the number of messages in queue at its peak still gets somewhat higher than amount of remaining objects.

Reproduced however still not clear what causes this problem so we will file as another issue and assign 1.4.0 milestone as it's not critical one.

The second problem (even more minor) is premature clearing of delete-bucket-state. It definitely isn't available for "minutes" after bucket deletion is done. Here are few examples, I was executing delete-bucket-state every 30 seconds and catching it in "finished" state is nearly impossible - the state disappears almost right away as soon as all queues get to 0:

Depending on the timing, finished status can disappear in dozen seconds. strictly speaking it ranges from 10 to 30 second according to https://github.com/leo-project/leofs/blob/develop/apps/leo_manager/include/leo_manager.hrl#L371-L383. let us know if this behavior bother you. we'd like to consider increasing the period.

To summarize the experiment result: deleting first bucket took 26 minutes, second (after pause for a few minutes) - 34 minutes. The queues on storage_1 and storage_2 were filling ("enqueuing") significantly slower for the second bucket, but processing them ("monitoring") went at about the same speed. Deleting both buckets at once took 39 minutes and the performance was identical for both buckets (however, queue filling for second bucket lagged for about a minute, despite executing both deletes at about the same time). I don't think there are any issues here.

Thanks for summarizing. I have no confidence however maybe the length of a bucket can affect the performance of the enqueuing phase (basically prefix matches happen on leveldb), I will confirm whether my assumption is correct when I can spare time. (or IIRC the AVS/metadata format of one bucket in your env is the older one. this can affect the performance as the additional logic for converting the format take place. I will also check how this factor affect the throughput of enqueuing phase)

I've checked what happens if node is stopped during bucket deletion and the problem with objects remaining is still present.

I still couldn't reproduce this one on my dev-box however turned out how this could happen according to log files you would share on the previous comment. In short, rollback_to_pending (https://github.com/leo-project/leofs/blob/develop/apps/leo_storage/src/leo_storage_handler_del_directory.erl#L375-L379) can fail if the leo_backend_db already finished its termination processing. I will file this one as another issue and get in touch with you once I've fixed it.

vstax commented 7 years ago

@mocchira

let us know if this behavior bother you. we'd like to consider increasing the period.

It does not bother per se, just doesn't quite match documentation which claims that it will be available for minutes :)

AVS/metadata format of one bucket in your env is the older one. this can affect the performance

Old format is in "body" bucket, removal of which (after the latest changes) in no tests shows that it's any slower than removing "bodytest" with new metadata. So I don't think it plays any role here.

maybe the length of a bucket can affect the performance of the enqueuing phase (basically prefix matches happen on leveldb),

Maybe. The difference between s0 and s1/s2 (as you can see from du output) is that s1 and s2 got about 10-11% more data, both in objects amounts and in raw data size compared to s0 (each AVS file on s0 is about 10% smaller than each AVS file on s1/s2). And that difference can be seen in "body, then bodytest" graph for the removal of first ("body") bucket, and in "body and bodytest" graph. It's just that when removing "bodytest" in "body, then bodytest" case the difference between s0 and s1/s2 is much higher than these 10%. It's hard to say if there is some real problem just from these tests.. I'll try to repeat the experiment again to see if this is reproducible, however I can't promise it will be anytime soon.

I will file this one as another issue and get in touch with you once I've fixed it.

OK, great. Just in case, here is the listing from diagnose-start of one of AVS files on storage_0. Most of remaining objects are first parts of multipart objects, plus lots of big objects in general. This is really quite different from real distribution of objects before deletion, as only small % of objects were that big.

mocchira commented 7 years ago

@vstax

It does not bother per se, just doesn't quite match documentation which claims that it will be available for minutes :)

Got it. documentation has been fixed with https://github.com/leo-project/leofs/pull/811.

Old format is in "body" bucket, removal of which (after the latest changes) in no tests shows that it's any slower than removing "bodytest" with new metadata. So I don't think it plays any role here.

OK.

It's hard to say if there is some real problem just from these tests.. I'll try to repeat the experiment again to see if this is reproducible, however I can't promise it will be anytime soon.

OK. please file the issue if you find the way to reproduce (I also will try if I can spare time).

As you may notice, remaining issues about delete-bucket are filed on

I will ask you to test once those issues get fixed.

mocchira commented 6 years ago

@vstax #812 has already solved OTOH #813 has still remained however it's not big deal so I'd like to ask you to test the delete-bucket related when you have a time. (I will set the milestone to 1.4.1 because of the same reason as https://github.com/leo-project/leofs/issues/931#issuecomment-366126397