Open hayley-leblanc opened 6 months ago
I'm encountering a similar issue on varmail
with more than 8 threads and webproxy
with more than 6; the workload fails with output like this:
1.557: Failed to open file 465, /sufs/bigfileset/00000001/00000466, with status 18: Success
1.557: flowop openfile3 failed to open file 00000466
1.557: filereaderthread-1: flowop openfile3-1 failed
Which is the same type of output fileserver
gave when run with more than 16 threads.
Hi,
Does ArckFS intercept stat64 calls?
ArckFS intercepts __fxstat64
, __lxstat64
, and __xstat64
, but not directly stat64
. I don't remember the reason for coding like this. It might be due to the glibc of our Linux distribution (``Ubuntu 20.04.4 LTS')'
The workload encounters errors opening files with more than 16 threads.
No, that is not the case for fileserver. We ran fileserver
with (much) more than 16 threads.
I don't know what causes the bug you have encountered. It might help to run the filebench workload with the provided Fxmark to see if the bug still exists; I remember I need to do something like turning off ASLR to make filebench run, and these operations are coded in the Fxmark scripts.
FYI, the fileserver config ArckFS uses is eval/benchmark/fxmark/bin/filebench-workloads/fileserver.f
. The varmail and webproxy configurations are hard-wired in eval/benchmark/fxmark/bin/run-filebench.py
It might help to see if ArckFS runs with these standard configurations with your platform.
I will also try to run the config you provided on our machine soon.
One thing I realized I did not do was disable hyperthreading in the BIOS. I don't have easy physical access to the machine I'm running on, so I'd prefer to not do that -- what exactly is the issue with having hyperthreading enabled? Could this be the source of the issues I'm having?
I did try running webproxy with fxmark and had more luck, although I already have a benchmarking setup with some other file systems and I'd prefer to not move it all over to fxmark. Do you have a sense of what additional setup fxmark does on top of mounting ArckFS and running the filebench binaries? I know it generates a different version of webproxy and varmail, but I tried copying those out and using them with filebench outside of fxmark and ran into the same issues as before, so it seems fxmark might be doing some additional work.
what exactly is the issue with having hyperthreading enabled?
The biggest issue is that we never tested the code with hyperthreading enabled, so I can imagine running code under this environment leads to lots of bugs (and possibly related to the issues you are having). Another issue I can think of is about performance due to the CPU numbering. For example, the delegation code (and some other parts of the code) distributes jobs to CPUs in a round-robin fashion. Hyperthreading makes a CPU receive two consecutive jobs.
Do you have a sense of what additional setup fxmark does on top of mounting ArckFS and running the filebench binaries?
Two things I can think of are turning off the address randomization and performing pinning CPUs with taskset
. It might help to grep exec_cmd
in run-fxmark.py
and run-filebench.py
Hi folks,
I am trying to run the fileserver workload from filebench with ArckFS on my local setup and I'm running into some problems. I'm using the default configuration of the workload rooted at
/sufs
, included below:When I run this workload using filebench-sufs, the
statfile
flowops all fail with the outputstatfile flowop statfile1 failed
. I usedstrace
on filebench and it caught a lot ofnewfstatat
calls to files under/sufs
(which I believe it wouldn't do if ArckFS was intercepting them?). It looks like filebench uses the functionfb_lfs_stat
in fb_localfs.c to call stat, which in turn callsstat64
. Does ArckFS interceptstat64
calls?As a quick workaround, I tried just removing the
statfile
flowop. However, with this modification, the workload encounters errors opening files with more than 16 threads. The paper indicates that varmail and webproxy are known to only work properly with 16 threads -- is this also expected to be the case with fileserver?I'm using the version of filebench included with the artifact, but not the fxmark tool, if it makes a difference.