leo-project / leofs

The LeoFS Storage System
https://leo-project.net/leofs/
Apache License 2.0
1.55k stars 155 forks source link

Errors and inconsistent state during upload + compaction #897

Open vstax opened 6 years ago

vstax commented 6 years ago

I'm uploading objects to cluster where one of the nodes (stor06) undergoes compaction. Nothing serious, default options - 4 files at once, the compaction itself goes pretty fast. No errors on gateway but I get errors about multipart objects on storage nodes. This seems a bit like #845 but different because only node that undergoes compaction gets inconsistencies. Logs on stor01

[W]     bodies01@stor01.selectel.cloud.lan     2017-10-19 22:07:34.955347 +0300        1508440054      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,64330076317063496433949952340715716362},{key,<<"bod15/30/12/d8/3012d8d548004bd371c78009d71071e2c8077b25d45b35fefebe074031be14ec0bee241ee3b4d7feefac6a4e7f7e2933a0f0270100000000.xz\nc6b5cba57862c171d7204e33a4284321">>},{clock,1508440051887130},{cause,not_found}]
[W]     bodies01@stor01.selectel.cloud.lan     2017-10-19 22:33:27.429564 +0300        1508441607      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,193374433589015262000463976961534345406},{key,<<"bod15/90/dd/a7/90dda7f37e4584e147faa5d226b214ddb5d6e038cce361ef471263caea7a13b960d4cc17e2c2756427a9f257b3a6f267e01cee0000000000.xz\nbdf44630c80a4b1279ca665d3d5ad95e">>},{clock,1508441606796843},{cause,not_found}]
[W]     bodies01@stor01.selectel.cloud.lan     2017-10-19 22:54:04.526043 +0300        1508442844      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,100877117596530444012773722764080698988},{key,<<"bod15/11/82/7c/11827c4553eae7ab33d043c3781ad683dbe1fb08abde599287287188dfd9dc504af4367e71679beb77defebd3a0ca78400261e0100000000.xz\n2e3eb8519ab230ae6fdf6b7c8aaed82d">>},{clock,1508442843814218},{cause,not_found}]

Logs on stor02:

[W]     bodies02@stor02.selectel.cloud.lan     2017-10-19 22:14:11.385164 +0300        1508440451      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,69468971868958431634452270947515965460},{key,<<"bod15/00/4d/33/004d33973a401550b37330e9a23f1da04e941921db0616f6c76fb45b8d8090d9357a3f767245b6dbcf966aa27c87ec100040fa0000000000.xz\na0b1ae8adff3ce938a8524381201adb1">>},{clock,1508440450617302},{cause,not_found}]
[W]     bodies02@stor02.selectel.cloud.lan     2017-10-19 22:38:14.460025 +0300        1508441894      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,318573671405344243235532148195153959426},{key,<<"bod15/31/04/fc/3104fcde0470e5d89c9991fc0edde57ed0d7356e637c24ec3d9fbe82fcf90524e9ff9cfb60e51efa639325e5ec4389c3687a1c0100000000.xz\n1d4efb3615fba8633671aaef6fc2b915">>},{clock,1508441893227239},{cause,not_found}]

On stor03:

[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:07:11.272867 +0300        1508440031      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,123267363961429714039177852970224067855},{key,<<"bod15/10/11/cd/1011cd9c5bfd6831968d29900a3ac3c14127f658eb425874b6b9b97a66a56a557b7794a6031533aa42ee7cc480d18ce90046230100000000.xz\n886a1eeb4bb21bd0c536e37ce546f296">>},{clock,1508440030614817},{cause,not_found}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:07:28.861727 +0300        1508440048      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,37027845853433312525340873927361210525},{key,<<"bod15/10/14/b6/1014b669f673863923a759f91ceb3de629971cd8cf3c198a7fc7d6ab20e91642585b385ac36f8e132b1dc138a80d50cc0881000100000000.xz\n153f1a9f34753586eb59c4a59dbed067">>},{clock,1508440047935279},{cause,not_found}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:13:28.845120 +0300        1508440408      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,79213917217618399355921356846758465046},{key,<<"bod15/30/44/c7/3044c71e5665ff99c2cc0a7cde7361d2a78244dc2c27243e5454e067463d317d1562771104849982584d04a2daedce26588c570000000000.xz\n19463fb39f3f21d339aaf132de65e6c4">>},{clock,1508440405121419},{cause,not_found}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:14:23.343669 +0300        1508440463      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,168160558999579653749439079336620366446},{key,<<"bod15/80/4a/d4/804ad45ec1386ab777bb04b9f17a3e79aaf416238f94b1b9aba2891901e019beda5312d38e34bb49a0d7528c7bd995fc0006d60000000000.xz\n0a67e906ed829eff088c6195286b43fd">>},{clock,1508440459998276},{cause,not_found}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:17:26.8061 +0300  1508440646      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,201795288656486221102774558035112486},{key,<<"bod15/70/62/08/7062089066c169cf3cf97107cedd63ecff501fe76d0ae065dc134985c4e13f5f134c16a71a8e55d71277785fc9a529850096da0000000000.xz\n27ff05d66bec2583262236162fb57f56">>},{clock,1508440644969745},{cause,not_found}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:20:14.842112 +0300        1508440814      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,326839748086901365310446735001658277102},{key,<<"bod15/60/76/bb/6076bbe0589193ad49a32f7a7a20039d446ef510c527efd167cec2db0f0fc22385669bd9f2ceac7f2ed2e3672f4598ceb06e090100000000.xz\ne7d3ae319fbeb51279387026a4f6a0dc">>},{clock,1508440814378187},{cause,not_found}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:24:40.256972 +0300        1508441080      leo_storage_read_repairer:compare/4     167     [{node,'bodies06@stor06.selectel.cloud.lan'},{addr_id,257703111427132794946859449273562309037},{key,<<"bod15/00/a4/90/00a490c2cb6f3778e77d92e662b4025f1e505aec16286363dbf1478ca0d0d98c1f8afdb174a91b940a0dd8c75c1ea71c008a220100000000.xz\n73f7dba5b6eff0c39d6b23cd162bf518">>},{clock,1508441079434912},{cause,not_found}]

similar situation for stor04 and stor05. All errors mention stor06.

There are also tons of "timeouts" in error logs on these nodes but it looks like all objects mentioned there are in correct state (deleted) so it's probably not a problem:

[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:36:04.252520 +0300        1508441764      leo_storage_replicator:loop/6   222     [{method,delete},{key,<<"bod15/30/ee/7e/30ee7e4c7835acc1eaff9702c644376d4b716e3aa3062f549ad9be4f12dc1f8e241cda8824831c92ac66976899a0f7ebf800630000000000.xz\n4c5d19b6a3a65cfbf119d8b7637820d9">>},{cause,timeout}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:36:04.999534 +0300        1508441764      leo_storage_replicator:replicate/5      125     [{method,delete},{key,<<"bod15/30/ee/7e/30ee7e4c7835acc1eaff9702c644376d4b716e3aa3062f549ad9be4f12dc1f8e241cda8824831c92ac66976899a0f7ebf800630000000000.xz\n4c5d19b6a3a65cfbf119d8b7637820d9">>},{cause,timeout}]
[W]     bodies03@stor03.selectel.cloud.lan     2017-10-19 22:36:35.568616 +0300        1508441795      leo_storage_replicator:loop/6   222     [{method,delete},{key,<<"bod15/30/ee/7e/30ee7e4c7835acc1eaff9702c644376d4b716e3aa3062f549ad9be4f12dc1f8e241cda8824831c92ac66976899a0f7ebf800630000000000.xz\n4c5d19b6a3a65cfbf119d8b7637820d9">>},{cause,timeout}]

On stor06, there are various things in logs. First, similar not_found messages:

[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:05:31.419506 +0300    1508439931  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/90/02/a0/9002a0c3e940e1a01bd03a7062b27b41fa906b58fbad8994e72e80d7de4495cbb85cc9b4d90cb6900f3e29ececce9c0b00802d0100000000.xz\n21f98deb429f96d51913edae41d8a8f4">>},{req_id,8022443},{cause,not_found}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:07:11.183058 +0300    1508440031  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/10/11/cd/1011cd9c5bfd6831968d29900a3ac3c14127f658eb425874b6b9b97a66a56a557b7794a6031533aa42ee7cc480d18ce90046230100000000.xz\n886a1eeb4bb21bd0c536e37ce546f296">>},{req_id,66593734},{cause,not_found}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:07:28.840017 +0300    1508440048  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/10/14/b6/1014b669f673863923a759f91ceb3de629971cd8cf3c198a7fc7d6ab20e91642585b385ac36f8e132b1dc138a80d50cc0881000100000000.xz\n153f1a9f34753586eb59c4a59dbed067">>},{req_id,49334556},{cause,not_found}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:07:34.955175 +0300    1508440054  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/30/12/d8/3012d8d548004bd371c78009d71071e2c8077b25d45b35fefebe074031be14ec0bee241ee3b4d7feefac6a4e7f7e2933a0f0270100000000.xz\nc6b5cba57862c171d7204e33a4284321">>},{req_id,77853337},{cause,not_found}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:13:28.844134 +0300    1508440408  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/30/44/c7/3044c71e5665ff99c2cc0a7cde7361d2a78244dc2c27243e5454e067463d317d1562771104849982584d04a2daedce26588c570000000000.xz\n19463fb39f3f21d339aaf132de65e6c4">>},{req_id,110509339},{cause,not_found}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:14:11.385120 +0300    1508440451  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/00/4d/33/004d33973a401550b37330e9a23f1da04e941921db0616f6c76fb45b8d8090d9357a3f767245b6dbcf966aa27c87ec100040fa0000000000.xz\na0b1ae8adff3ce938a8524381201adb1">>},{req_id,27225947},{cause,not_found}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:14:21.985534 +0300    1508440461  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/80/4a/d4/804ad45ec1386ab777bb04b9f17a3e79aaf416238f94b1b9aba2891901e019beda5312d38e34bb49a0d7528c7bd995fc0006d60000000000.xz\n0a67e906ed829eff088c6195286b43fd">>},{req_id,71202439},{cause,not_found}]

Timeout errors as well (again, some objects from these lines that I've checked all seem fine):

[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:11:50.188941 +0300    1508440310  leo_storage_replicator:loop/6   222 [{method,delete},{key,<<"bod15/80/2d/30/802d303bb263aa946ccf04a3860726d8eb37fbce0f9abaea68763249221d13c7d0f860c0de0bd3f46681fa39a1a82fc8e01cee0000000000.xz\n41db494034ce556b71dea040c560ec5a">>},{cause,timeout}]
[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:11:51.167916 +0300    1508440311  leo_storage_replicator:replicate/5  125 [{method,delete},{key,<<"bod15/80/2d/30/802d303bb263aa946ccf04a3860726d8eb37fbce0f9abaea68763249221d13c7d0f860c0de0bd3f46681fa39a1a82fc8e01cee0000000000.xz\n41db494034ce556b71dea040c560ec5a">>},{cause,timeout}]
[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:19:44.65311 +0300 1508440784  leo_storage_replicator:loop/6   222 [{method,delete},{key,<<"bod15/50/70/eb/5070eb399e8188227d4b0c4bce292b58346ac76169735e46ddf2682d30ee091f9d3b6514ddddfa10654af5bab3ee880ca0780f0100000000.xz\nd7cc3e4659e7cecfafb5f258f63f8f69">>},{cause,timeout}]

"case,unavailable" errors. Many lines repeated over and over here. Some objects that I've checked all seem to be fine (deleted):

[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:11:13.212273 +0300    1508440273  leo_storage_replicator:replicate_fun/2  249 [{key,<<"bod15/80/2d/30/802d303bb263aa946ccf04a3860726d8eb37fbce0f9abaea68763249221d13c7d0f860c0de0bd3f46681fa39a1a82fc8e01cee0000000000.xz\n41db494034ce556b71dea040c560ec5a">>},{node,local},{req_id,53642303},{cause,unavailable}]
[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:11:20.167785 +0300    1508440280  leo_storage_replicator:replicate_fun/2  249 [{key,<<"bod15/80/2d/30/802d303bb263aa946ccf04a3860726d8eb37fbce0f9abaea68763249221d13c7d0f860c0de0bd3f46681fa39a1a82fc8e01cee0000000000.xz\n41db494034ce556b71dea040c560ec5a">>},{node,local},{req_id,0},{cause,unavailable}]
[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:19:05.892534 +0300    1508440745  leo_storage_replicator:replicate_fun/2  249 [{key,<<"bod15/50/70/eb/5070eb399e8188227d4b0c4bce292b58346ac76169735e46ddf2682d30ee091f9d3b6514ddddfa10654af5bab3ee880ca0780f0100000000.xz\nd7cc3e4659e7cecfafb5f258f63f8f69">>},{node,local},{req_id,8590614},{cause,unavailable}]
[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:19:13.611882 +0300    1508440753  leo_storage_replicator:replicate_fun/2  249 [{key,<<"bod15/50/70/eb/5070eb399e8188227d4b0c4bce292b58346ac76169735e46ddf2682d30ee091f9d3b6514ddddfa10654af5bab3ee880ca0780f0100000000.xz\nd7cc3e4659e7cecfafb5f258f63f8f69">>},{node,local},{req_id,0},{cause,unavailable}]

Some other errors (lots of repeated lines here as well. Objects seem to be fine, deleted):

[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:36:05.565416 +0300    1508441765  leo_storage_handler_object:replicate_fun/3  1408    [{cause,"locked obj-conatainer"}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:36:05.565844 +0300    1508441765  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/30/ee/7e/30ee7e4c7835acc1eaff9702c644376d4b716e3aa3062f549ad9be4f12dc1f8e241cda8824831c92ac66976899a0f7ebf800630000000000.xz\n4c5d19b6a3a65cfbf119d8b7637820d9">>},{req_id,0},{cause,"locked obj-conatainer"}]
[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:36:37.661347 +0300    1508441797  leo_storage_handler_object:replicate_fun/3  1408    [{cause,"locked obj-conatainer"}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:36:37.662562 +0300    1508441797  leo_storage_handler_object:put/4    424 [{from,storage},{method,delete},{key,<<"bod15/30/ee/7e/30ee7e4c7835acc1eaff9702c644376d4b716e3aa3062f549ad9be4f12dc1f8e241cda8824831c92ac66976899a0f7ebf800630000000000.xz\n4c5d19b6a3a65cfbf119d8b7637820d9">>},{req_id,0},{cause,"locked obj-conatainer"}]
[W] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:37:09.15084 +0300 1508441829  leo_storage_handler_object:replicate_fun/3  1408    [{cause,"locked obj-conatainer"}]
[E] bodies06@stor06.selectel.cloud.lan  2017-10-19 22:37:09.15180 +0300 1508441829  leo_storage_handler_object:put/4424 [{from,storage},{method,delete},{key,<<"bod15/30/ee/7e/30ee7e4c7835acc1eaff9702c644376d4b716e3aa3062f549ad9be4f12dc1f8e241cda8824831c92ac66976899a0f7ebf800630000000000.xz\n4c5d19b6a3a65cfbf119d8b7637820d9">>},{req_id,0},{cause,"locked obj-conatainer"}]

In info logs just some long operations are logged (with high processing_time) and compaction reports (all "result,success").

Problem: all objects mentioned with "case,not_found" in logs on every node are in inconsistent state on stor06:

[vm@bodies-master ~]$ leofs-adm whereis "bod15/30/12/d8/3012d8d548004bd371c78009d71071e2c8077b25d45b35fefebe074031be14ec0bee241ee3b4d7feefac6a4e7f7e2933a0f0270100000000.xz\nc6b5cba57862c171d7204e33a4284321"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | bodies01@stor01.selectel.cloud.lan      | 306585a3c85b99ff60f598314b0d3f0a     |         0B |   d41d8cd98f | false          |              0 | 55beb117af2fb  | 2017-10-19 22:07:53 +0300
       | bodies06@stor06.selectel.cloud.lan      | 306585a3c85b99ff60f598314b0d3f0a     |         0B |   d41d8cd98f | false          |              0 | 55beb11752c1a  | 2017-10-19 22:07:52 +0300
  *    | bodies03@stor03.selectel.cloud.lan      | 306585a3c85b99ff60f598314b0d3f0a     |         0B |   d41d8cd98f | false          |              0 | 55beb117af2fb  | 2017-10-19 22:07:53 +0300

[vm@bodies-master ~]$ leofs-adm whereis "bod15/90/dd/a7/90dda7f37e4584e147faa5d226b214ddb5d6e038cce361ef471263caea7a13b960d4cc17e2c2756427a9f257b3a6f267e01cee0000000000.xz\nbdf44630c80a4b1279ca665d3d5ad95e"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | bodies01@stor01.selectel.cloud.lan      | 917a8faba4606a0ea8f19c08e6f840be     |         0B |   d41d8cd98f | false          |              0 | 55beb6e2980d2  | 2017-10-19 22:33:48 +0300
       | bodies06@stor06.selectel.cloud.lan      | 917a8faba4606a0ea8f19c08e6f840be     |         0B |   d41d8cd98f | false          |              0 | 55beb6e23362b  | 2017-10-19 22:33:47 +0300
  *    | bodies02@stor02.selectel.cloud.lan      | 917a8faba4606a0ea8f19c08e6f840be     |         0B |   d41d8cd98f | false          |              0 | 55beb6e2980d2  | 2017-10-19 22:33:48 +0300
[vm@bodies-master ~]$ leofs-adm whereis "bod15/00/4d/33/004d33973a401550b37330e9a23f1da04e941921db0616f6c76fb45b8d8090d9357a3f767245b6dbcf966aa27c87ec100040fa0000000000.xz\na0b1ae8adff3ce938a8524381201adb1"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | bodies02@stor02.selectel.cloud.lan      | 34433cc0887a1551348b37f20bf32014     |         0B |   d41d8cd98f | false          |              0 | 55beb29411cac  | 2017-10-19 22:14:32 +0300
       | bodies06@stor06.selectel.cloud.lan      | 34433cc0887a1551348b37f20bf32014     |         0B |   d41d8cd98f | false          |              0 | 55beb29394fd6  | 2017-10-19 22:14:31 +0300
  *    | bodies04@stor04.selectel.cloud.lan      | 34433cc0887a1551348b37f20bf32014     |         0B |   d41d8cd98f | false          |              0 | 55beb29411cac  | 2017-10-19 22:14:32 +0300

[vm@bodies-master ~]$ leofs-adm whereis "bod15/10/14/b6/1014b669f673863923a759f91ceb3de629971cd8cf3c198a7fc7d6ab20e91642585b385ac36f8e132b1dc138a80d50cc0881000100000000.xz\n153f1a9f34753586eb59c4a59dbed067"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | bodies03@stor03.selectel.cloud.lan      | 1bdb4dbff9220da209294d0f1da2a89d     |         0B |   d41d8cd98f | false          |              0 | 55beb1146335b  | 2017-10-19 22:07:49 +0300
  *    | bodies02@stor02.selectel.cloud.lan      | 1bdb4dbff9220da209294d0f1da2a89d     |         0B |   d41d8cd98f | false          |              0 | 55beb1146335b  | 2017-10-19 22:07:49 +0300
       | bodies06@stor06.selectel.cloud.lan      | 1bdb4dbff9220da209294d0f1da2a89d     |         0B |   d41d8cd98f | false          |              0 | 55beb1138df2f  | 2017-10-19 22:07:48 +0300

(recover-file fixes this fine).

mocchira commented 6 years ago

@vstax Thanks for reporting. seems like the root cause would be #845 (put and delete requests got inverted chronologically) and the storage node on which compaction is in-progress might be likely to face #845 for some reason. I will dig into further to get to the fact. anyway, just in case let me ask you this issue isn't big deal for your production system? if it is then let me know and I will do my best until the launch.

vstax commented 6 years ago

@mocchira No, it's not of priority, it's just something I noticed when looking at logs from the last experiment. Plus, since recover-file works, it's possible to fix these objects after noticing the errors, just a bit annoying that they have happened at all (I wasn't pushing the system or anything - just launched compaction with default parameters). I actually killed this cluster already so I can't execute "whereis" anymore; I have a copy of all its logs just in case, though. I managed to make somewhat better RING for the new (production) one after a few tries - only up to 8.5% difference in distribution between nodes.

I got one question, though. These clusters have W=2, D=1 - because applications don't really do any deletes on this data, I don't care what D is set to. I just thought, since multipart upload internally executes delete for temporary object - does that delete follow W or D? (I think it makes sense to follow W since it's all part of "write" operation from application perspective but not 100% sure). If it follows D now, could it be that setting D=2 would make a chance of having these errors less?

mocchira commented 6 years ago

@vstax

No, it's not of priority

OK.

I got one question, though. These clusters have W=2, D=1 - because applications don't really do any deletes on this data, I don't care what D is set to. I just thought, since multipart upload internally executes delete for temporary object - does that delete follow W or D? (I think it makes sense to follow W since it's all part of "write" operation from application perspective but not 100% sure). If it follows D now, could it be that setting D=2 would make a chance of having these errors less?

Delete for temporary objects follow D so setting D=2 might make these errors less likely to happen.

I think it makes sense to follow W since it's all part of "write" operation from application perspective

This part seems to be arguable to me so let us consider for a while, we will file this problem if we judge Delete for temporary objects should follow W.

vstax commented 6 years ago

@mocchira Well, the thing is - my current configuration (W=2 and D=1) is incorrect regarding MU, I think. Since that delete uses D, I should've used D=W to make sure that all parts of "PUT" operations follow "2" quorum. So it makes sense to use D=2 even though I didn't plan to delete anything - but I had no way of knowing that beforehand. So I think either that delete of temporary object should use W as quorum or it's a good idea to mention it in documentation at least, that running with D smaller than W can create a pitfall regarding multipart upload and it's probably not a good idea.

Actually, now that I think, I don't understand why is D configurable at all. What are circumstances in which someone would want to run cluster where D != W? If it needs to be configurable for some very special reasons, maybe documentation should recommend everyone to always use D=W, even if they don't plan to do any delete, unless they are 100% sure they have good reasons to set it otherwise. This way, people (hopefully) won't repeat my mistake.

mocchira commented 6 years ago

@vstax

Actually, now that I think, I don't understand why is D configurable at all. What are circumstances in which someone would want to run cluster where D != W? If it needs to be configurable for some very special reasons, maybe documentation should recommend everyone to always use D=W, even if they don't plan to do any delete, unless they are 100% sure they have good reasons to set it otherwise. This way, people (hopefully) won't repeat my mistake.

AFAIK, no users using D != W. however there are some cases D != W can be useful. the one is to realize read-after-delete consistency but no need read-after-write consistency. in this case, having D=N and W<N enable leofs to handle such scenarios and also keep W availability high while availability only for D get decreased (if there is only W then both availability have to be decreased). As for the documentation, totally agreed with you so we will add this information into somewhere in the doc.

mocchira commented 6 years ago

926 and #845 could make this issue happen more often, although those two have been fixed now so can you check this issue still exists? > @vstax

vstax commented 6 years ago

@mocchira I will try, however I'd like to note that version with this problem already included fix from #845, and #926 should not affect this at all because I was uploading objects with script that only uploads each file once. I was uploading objects in parallel but each uploader instance worked with its own set of source files. So I don't think that anything should change because of #926 was fixed as well.

mocchira commented 6 years ago

@vstax

however I'd like to note that version with this problem already included fix from #845, and #926 should not affect this at all because I was uploading objects with script that only uploads each file once. I was uploading objects in parallel but each uploader instance worked with its own set of source files. So I don't think that anything should change because of #926 was fixed as well.

I see, will keep vetting.