Use simple folio migration function to avoid BUG in migrate_folio_extra

tstabrawa commented 3 weeks ago

Motivation and Context

Starting in kernel version 6.3.0, Linux page migration is done in a batched manner, where several pages are first unmapped, and then the pages moved. See @RodoMa92's bisection test results in #15140 for details on the kernel change. As a result of this change, and due to ZFS's implicit use of fallback_migrate_folio (by not specifying a .migrate_folio function in its address_space_operations), when dirty pages are migrated, they are written back during the "move" step. Then, when the "move" step is retried, fallback_migrate_folio calls into migrate_folio, which in turn calls into migrate_folio_extra, and migrate_folio_extra BUG's out due to the page still being under writeback.

Note: The kernel page migration code will actually wait for any previously-started writeback during the "unmap" step, which could explain why unbatched retries didn't hit this problem prior to kernel 6.3.0. For example, in 6.2.16, unmap_and_move will wait for writeback to complete each time it retries migration on a page (such as would be the case if fallback_migrate_folio called writeout for the page). In contrast, in 6.3.1, since only the "move" step is repeated on move-related retries, the kernel doesn't wait for writeback operations started by fallback_migrate_folio.

A previous attempt at fixing this problem was merged as part of PR #16568, but this turned out not to be a complete fix. @RodoMa92 re-raised the problem and explained that, while the problem occurred less frequently, it still occurred approximately once per 20 attempts.

Description

Rather than ZFS using the default fallback_migrate_folio function as a result of not specifying a .migrate_folio function, this PR causes it to use the simple folio migration function (migrate_folio) instead. Notably, migrate_folio (since it's intended for filesystems that don't use buffers / private data) doesn't start writeback like fallback_migrate_folio does, and therefore migrate_folio_extra wouldn't BUG() on pages still under writeback, even without us setting PagePrivate on every page.

Additionally, this PR reverts the changes merged as part of PR #16568, as these changes are no longer necessary when using migrate_folio instead of fallback_migrate_folio.

A side effect of this change could be that ZFS page migration gets noticeably faster and/or more effective (as it would no longer be skipping and/or starting writeback on all dirty pages being migrated).

How Has This Been Tested?

Tested by user @RodoMa92 - results in the following comments on #16568:

Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-44-generic. No crashes were observed, though several tests were skipped/failed/killed. I don't expect these errors were caused by my changes but would defer to your expertise if you have any concerns:

zfs-tests - 47700cf35384909b88fb70893a9395dc28e4344d.txt

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Performance enhancement (non-breaking change which improves efficiency)
[ ] Code cleanup (non-breaking change which makes code smaller or more readable)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
[ ] Documentation (a change to man pages or other documentation)

Checklist:

[x] My code follows the OpenZFS code style requirements.
[ ] I have updated the documentation accordingly.
[x] I have read the contributing document.
[ ] I have added tests to cover my changes.
[x] I have run the ZFS Test Suite with this change applied.
[x] All commit messages are properly formatted and contain Signed-off-by.

tstabrawa commented 3 weeks ago

Looks like I had a typo in the code for older kernel APIs (.migrate_page should be .migratepage). Amending my commit to fix.

diff --git a/module/os/linux/zfs/zpl_file.c b/module/os/linux/zfs/zpl_file.c
index f7ae381..f6e0143 100644
--- a/module/os/linux/zfs/zpl_file.c
+++ b/module/os/linux/zfs/zpl_file.c
@@ -1094,7 +1094,7 @@ const struct address_space_operations zpl_address_space_operations = {
 #ifdef HAVE_VFS_MIGRATE_FOLIO
    .migrate_folio  = migrate_folio,
 #else
-   .migrate_page   = migrate_page,
+   .migratepage    = migrate_page,
 #endif
 };

RodoMa92 commented 2 weeks ago

Thanks for this, @tstabrawa. Still held up fine as of today. I feel much more confident now to say it's finally closed, and allow me again to use the amazing features of ZFS.

Again, thanks a lot :)

Marco

openzfs / zfs