openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.3k stars 1.71k forks source link

use-after-free in dsl_dataset_promote_sync() #16272

Open markjdb opened 2 weeks ago

markjdb commented 2 weeks ago

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 15.0-CURRENT
Kernel Version 517c5854588eaa7c2248d97cd750b8b8bad9d69f
Architecture amd64
OpenZFS Version

zfs-2.2.99-517-FreeBSD_ge2357561b zfs-kmod-2.2.99-517-FreeBSD_ge2357561b

Describe the problem you're observing

Running the ZFS test suite with KASAN triggered a panic:

panic: ASan: Invalid access, 8-byte read at 0xfffffe00e8bba1e0, UMAUseAfterFree(fd)              
cpuid = 1                                                
time = 1718485401                                                     
KDB: stack backtrace:                                                                                                                                          
db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame 0xfffffe00d49a4110                                                                                 
kdb_backtrace() at kdb_backtrace+0xc6/frame 0xfffffe00d49a4270
vpanic() at vpanic+0x226/frame 0xfffffe00d49a4410                     
panic() at panic+0xb5/frame 0xfffffe00d49a44e0                                                                                                                 
kasan_report() at kasan_report+0xdf/frame 0xfffffe00d49a45b0                                                                                                   
dsl_dataset_promote_sync() at dsl_dataset_promote_sync+0x1421/frame 0xfffffe00d49a48e0
dsl_sync_task_sync() at dsl_sync_task_sync+0x17a/frame 0xfffffe00d49a4930
dsl_pool_sync() at dsl_pool_sync+0x8db/frame 0xfffffe00d49a4a50
spa_sync() at spa_sync+0x10f3/frame 0xfffffe00d49a4cd0
txg_sync_thread() at txg_sync_thread+0x7c5/frame 0xfffffe00d49a4ef0
fork_exit() at fork_exit+0xa3/frame 0xfffffe00d49a4f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00d49a4f30
--- trap 0xc, rip = 0x14eda78e331a, rsp = 0x14edb0908eb8, rbp = 0x14edb0908ed0 ---

This corresponds to the dereference of origin_head at the very end of dsl_dataset_promote_sync().

Describe how to reproduce the problem

It is not consistently reproducible, so far I only hit this once.

I believe the problem is that origin_head and hds are not safe to dereference after promote_rele() is called. Either the object IDs should be loaded before the references are released, or references should be released after calling spa_swap_errlog().

markjdb commented 2 weeks ago

This apparent bug was introduced in commit 0409d3327371 ("Improve zpool status output, list all affected datasets"), perhaps @gamanakis could take a look?

gamanakis commented 2 weeks ago

Thanks for catching this, you are correct, I will submit a PR.

gamanakis commented 2 weeks ago

@markjdb could you take a look at #16273? Once it runs through the ZTS I will mark it as non-draft.

markjdb commented 2 weeks ago

@markjdb could you take a look at #16273? Once it runs through the ZTS I will mark it as non-draft.

It looks fine to me. I applied the patch locally and kicked off another test run with KASAN.

markjdb commented 2 weeks ago

@markjdb could you take a look at #16273? Once it runs through the ZTS I will mark it as non-draft.

It looks fine to me. I applied the patch locally and kicked off another test run with KASAN.

I didn't see any problems with the patch.