openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.69k stars 1.76k forks source link

Use simple folio migration function to avoid BUG in migrate_folio_extra #16723

Closed tstabrawa closed 3 weeks ago

tstabrawa commented 3 weeks ago

Motivation and Context

Starting in kernel version 6.3.0, Linux page migration is done in a batched manner, where several pages are first unmapped, and then the pages moved. See @RodoMa92's bisection test results in #15140 for details on the kernel change. As a result of this change, and due to ZFS's implicit use of fallback_migrate_folio (by not specifying a .migrate_folio function in its address_space_operations), when dirty pages are migrated, they are written back during the "move" step. Then, when the "move" step is retried, fallback_migrate_folio calls into migrate_folio, which in turn calls into migrate_folio_extra, and migrate_folio_extra BUG's out due to the page still being under writeback.

Note: The kernel page migration code will actually wait for any previously-started writeback during the "unmap" step, which could explain why unbatched retries didn't hit this problem prior to kernel 6.3.0. For example, in 6.2.16, unmap_and_move will wait for writeback to complete each time it retries migration on a page (such as would be the case if fallback_migrate_folio called writeout for the page). In contrast, in 6.3.1, since only the "move" step is repeated on move-related retries, the kernel doesn't wait for writeback operations started by fallback_migrate_folio.

A previous attempt at fixing this problem was merged as part of PR #16568, but this turned out not to be a complete fix. @RodoMa92 re-raised the problem and explained that, while the problem occurred less frequently, it still occurred approximately once per 20 attempts.

Description

Rather than ZFS using the default fallback_migrate_folio function as a result of not specifying a .migrate_folio function, this PR causes it to use the simple folio migration function (migrate_folio) instead. Notably, migrate_folio (since it's intended for filesystems that don't use buffers / private data) doesn't start writeback like fallback_migrate_folio does, and therefore migrate_folio_extra wouldn't BUG() on pages still under writeback, even without us setting PagePrivate on every page.

Additionally, this PR reverts the changes merged as part of PR #16568, as these changes are no longer necessary when using migrate_folio instead of fallback_migrate_folio.

A side effect of this change could be that ZFS page migration gets noticeably faster and/or more effective (as it would no longer be skipping and/or starting writeback on all dirty pages being migrated).

How Has This Been Tested?

Tested by user @RodoMa92 - results in the following comments on #16568:

Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-44-generic. No crashes were observed, though several tests were skipped/failed/killed. I don't expect these errors were caused by my changes but would defer to your expertise if you have any concerns:

Types of changes

Checklist:

tstabrawa commented 3 weeks ago

Looks like I had a typo in the code for older kernel APIs (.migrate_page should be .migratepage). Amending my commit to fix.

diff --git a/module/os/linux/zfs/zpl_file.c b/module/os/linux/zfs/zpl_file.c
index f7ae381..f6e0143 100644
--- a/module/os/linux/zfs/zpl_file.c
+++ b/module/os/linux/zfs/zpl_file.c
@@ -1094,7 +1094,7 @@ const struct address_space_operations zpl_address_space_operations = {
 #ifdef HAVE_VFS_MIGRATE_FOLIO
    .migrate_folio  = migrate_folio,
 #else
-   .migrate_page   = migrate_page,
+   .migratepage    = migrate_page,
 #endif
 };
RodoMa92 commented 2 weeks ago

Thanks for this, @tstabrawa. Still held up fine as of today. I feel much more confident now to say it's finally closed, and allow me again to use the amazing features of ZFS.

Again, thanks a lot :)

Marco