Open hayley-leblanc opened 8 months ago
Hi,
Thanks for the bug report.
I suspect that the bug occurs because the application runs in two NUMA nodes, while only the PM in one NUMA node is passed to ArckFS. This is a scenario we forget to handle in our prototype and a call stack (that starts from LibFS) similar to something below may cause an issue.
sufs_libfs_do_new_dir_data_blocks
--> sufs_libfs_new_blocks(..., sufs_libfs_current_node())
//sufs_libfs_current_node()
returns the current NUMA node of the invoking CPU, which could be either 0 to 1 in this case)
--> sufs_libfs_cmd_alloc_blocks
--> sufs_alloc_blocks_to_libfs(pmnode == 1)
Could you confirm that is the scenario?
Thanks for the quick response! Yes, the machine has two NUMA nodes but I'm only using one NVDIMM. I tried running my program with numactl --cpunodebind=0
and that seems to have resolved the problem, so I think you are correct. If there's a better (quick) fix I could implement, I'm happy to make a PR; otherwise I'll use numactl
as a workaround.
Hi Trio authors,
I am trying to run a simple workload that creates a 1000 directories in ArckFS and am running into some errors that appear to be related to block allocation. I followed the setup steps (excluding building the other file systems and included benchmarks) in the README, then followed the initialization steps in the minimal working example document to mount ArckFS. My workload is compiled against libfs and sufs.so the same way that the fsutils programs are. I'm running ArckFS on a 64-core machine running Debian Bookworm and Linux kernel version 6.3.0 using one 128GB Intel Optane PMM for the experiment. I ported the KFS module to this kernel version (including modifications to drivers/dax/bus.c in the kernel itself) but the errors I am encountering seem unrelated to the port.
I did a bit of printk debugging and determined that these errors appear to only trigger when the
cpu
argument tosufs_new_blocks
is 63 and thepm_node
argument is 1.The most common issue I'm seeing is a soft lockup, usually early in the workload. An example stack trace is included below.
I've also seen a couple of page-fault-related kernel panics that are also related to CPU 63, but I haven't been able to replicate this one to include a stack trace; it seems to occur more or less randomly. The call trace is similar or identical in the page fault panic.
I rebuilt the kernel with
CONFIG_DEBUG_SPINLOCK
enabled to try to get more information about the error, which seems to enable a lock magic value check that consistently fails oncpu
63 andpm_node
1.I noticed that
sufs_get_free_list
returnsfree_lists[cpu * sb->pm_nodes + pm_node]
butsufs_alloc_block_free_lists
only allocates an array of sizecpus * sb->pm_nodes
; it seems thatsufs_get_free_list
may be going out of bounds on the free list array in the problematic case I'm running into. I was hoping to fix this but myself but I haven't really been able to understand the role of thepm_node
value; why is it sometimes 0 and sometimes 1 when I am only using 1 PM device?Thank you in advance for your help!