irods / irods_capability_storage_tiering

BSD 3-Clause "New" or "Revised" License
5 stars 10 forks source link

`test_put_multi_fetch_page` fails frequently against Ubuntu 24.04 / MySQL 8.4 #279

Open korydraughn opened 1 month ago

korydraughn commented 1 month ago

Encountered during testing of what will be iRODS 4.3.3. Platform is Ubuntu 24.04. Database is MySQL 8.4.

The test fails due to the following assertion.

https://github.com/irods/irods_capability_storage_tiering/blob/822347c9dc05612e53e6eb2e1da473283ee675f1/packaging/test_plugin_unified_storage_tiering.py#L1520

The test creates (256 * 2) + 1 data objects and then starts waiting for them to be moved to another tier. This works, but close observation shows there are failures which lead to stale replicas existing on the original tier. I've noticed at least 3 replicas in this state following test completion. The test fails because it finds replicas on the original tier, even though it moved 400+ replicas.

Decreasing the number of replicas involved (by 100 or so) resulted in the test passing. However, reducing the number of replicas isn't a real fix. We need to figure out WHY data movement fails for some replicas.

Below is the output of the failed test.

        <testcase classname="irods.test.test_plugin_unified_storage_tiering.TestStorageTieringContinueInxMigration" name="test_put_multi_fetch_page" time="287.555" timestamp="2024-08-13T21:11:55" file="scripts/irods/test/test_plugin_unified_storage_tiering.py" line="1503">
                <failure type="AssertionError" message=""><![CDATA[Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 180, in delay_assert
    out, err, rc = function()
                   ^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 1520, in <lambda>
    delay_assert(lambda: admin_session.assert_icommand_fail(['ils', '-l', dirname], 'STDOUT_SINGLELINE', 'ufs0'))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/session.py", line 166, in assert_icommand_fail
    return assert_command_fail(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 79, in assert_command_fail
    return _assert_helper(*args, should_fail=True, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 104, in _assert_helper
    assert result
           ^^^^^^
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 1520, in test_put_multi_fetch_page
    delay_assert(lambda: admin_session.assert_icommand_fail(['ils', '-l', dirname], 'STDOUT_SINGLELINE', 'ufs0'))
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 187, in delay_assert
    assert(False)
           ^^^^^
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 1526, in test_put_multi_fetch_page
    admin_session.assert_icommand('irm -r ' + dirname)
  File "/var/lib/irods/scripts/irods/test/session.py", line 162, in assert_icommand
    return assert_command(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 76, in assert_command
    return _assert_helper(*args, should_fail=False, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 104, in _assert_helper
    assert result
           ^^^^^^
AssertionError
]]></failure>

Here is the final listing before the assertion fails.

id     name
11110 {"delay_conditions":"<INST_NAME>irods_rule_engine_plugin-unified_storage_tiering-instance</INST_NAME><EF>60s REPEAT UNTIL SUCCESS OR 5 TIMES</EF><PLUSET>1s</PLUSET>","destination-resource":"ufs1","group-name":"example_group","md5":"ca0e41ba44e21e0c2e4eb7d9064d0caf","object-path":"/tempZone/home/rods/test_put_multi_fetch_page/junk0365","preserve-replicas":false,"rule-
engine-instance-name":"irods_rule_engine_plugin-unified_storage_tiering-instance","rule-engine-operation":"irods_policy_data_movement","source-replica-number":"0","source-resource":"ufs0","user-name":"rods","user-zone":"tempZone","verification-type":"catalog"}

 --- IrodsSession: icommand executed by [rods#tempZone] [ils -l test_put_multi_fetch_page] ---
Assert FAIL Command: ils -l test_put_multi_fetch_page
Expecting STDOUT_SINGLELINE: ['ufs0']
  stdout:
    | /tempZone/home/rods/test_put_multi_fetch_page:
<snip>
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0360
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0361
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0362
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0363
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0364
    |   rods              0 ufs0            1 2024-08-14.15:30 & junk0365  <== Should not see ufs0.
    |   rods              1 ufs1            1 2024-08-14.15:32 X junk0365  <== Or a stale replica.
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0366
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0367
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0368
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0369
<snip>