Open dhowells opened 6 months ago
That segv can be fixed with this patch:
diff --git a/src/MainNFSD/9p_dispatcher.c b/src/MainNFSD/9p_dispatcher.c
index abf35a452310..8e0380d9dbd8 100644
--- a/src/MainNFSD/9p_dispatcher.c
+++ b/src/MainNFSD/9p_dispatcher.c
@@ -407,7 +407,7 @@ static void _9p_worker_run(struct fridgethr_context *ctx)
continue;
reqdata->_9prq_mutex = &_9pw_mutex;
- reqdata->_9prq_mutex = &_9pw_mutex;
+ reqdata->_9prq_cond = &_9pw_cond;
_9p_execute(reqdata);
_9p_free_reqdata(reqdata);
But even fixing that a simple read just hangs there, so there's obviously more problems waiting. I don't have time to dig right now, sorry...
Given that particular bug has been here since april already (introduced in a4e52ddf16ac565380bc8608c6de65229224cb23 ) I think it's safe to say no-one cares about 9p support in ganesha at this point: @phdeniel @patlucas are you still using it?
If not I'd be in favor of just removing the 9p code; I can eventually get around to fixing it but it's not reasonable to just expect it to keep working if nobody's running tests and looking at it in at least 8 months.
Ok so the hang is because we're not initializing read/write args and the fsal helpers loops on fsal_resume, a memset helps:
diff --git a/src/Protocols/9P/9p_read.c b/src/Protocols/9P/9p_read.c
index 284dc23910e2..b96dc91a2834 100644
--- a/src/Protocols/9P/9p_read.c
+++ b/src/Protocols/9P/9p_read.c
@@ -114,6 +114,7 @@ int _9p_read(struct _9p_request_data *req9p, u32 *plenout, char *preply)
struct async_process_data read_data;
struct fsal_io_arg *read_arg = alloca(sizeof(*read_arg) +
sizeof(struct iovec));
+ memset(read_arg, 0, sizeof(*read_arg));
read_arg->info = NULL;
read_arg->state = pfid->state;
diff --git a/src/Protocols/9P/9p_write.c b/src/Protocols/9P/9p_write.c
index 44b48d1d3dbf..9a447732e505 100644
--- a/src/Protocols/9P/9p_write.c
+++ b/src/Protocols/9P/9p_write.c
@@ -134,6 +134,7 @@ int _9p_write(struct _9p_request_data *req9p, u32 *plenout, char *preply)
struct async_process_data write_data;
struct fsal_io_arg *write_arg = alloca(sizeof(*write_arg) +
sizeof(struct iovec));
+ memset(write_arg, 0, sizeof(*write_arg));
write_arg->info = NULL;
write_arg->state = pfid->state;
After that, read/writes seem to work but creating a file also seg faults; I didn't try to understand aa00d4136865e5f8ec8296faa67fcd9084771bf2 (Sept 2022, we're now over a year broken) but I'd assume some confusion between ppentry and pentry? This works around the immediate problem even if it's not perfect:
diff --git a/src/Protocols/9P/9p_lcreate.c b/src/Protocols/9P/9p_lcreate.c
index 4c99a8de6e8b..eb03dccb0051 100644
--- a/src/Protocols/9P/9p_lcreate.c
+++ b/src/Protocols/9P/9p_lcreate.c
@@ -165,7 +165,7 @@ int _9p_lcreate(struct _9p_request_data *req9p, u32 *plenout, char *preply)
pfid->opens = 1;
/* Get a active reference */
- pfid->ppentry->obj_ops->get_ref(pfid->ppentry);
+ pfid->pentry->obj_ops->get_ref(pfid->pentry);
/* Build the reply */
_9p_setinitptr(cursor, preply, _9P_RLCREATE);
diff --git a/src/Protocols/9P/9p_proto_tools.c b/src/Protocols/9P/9p_proto_tools.c
index e6e9c019c85d..ecb939308515 100644
--- a/src/Protocols/9P/9p_proto_tools.c
+++ b/src/Protocols/9P/9p_proto_tools.c
@@ -408,7 +408,7 @@ int _9p_tools_clunk(struct _9p_fid *pfid)
*/
while (pfid->opens > 0) {
/* Release the active reference for each open */
- pfid->ppentry->obj_ops->put_ref(pfid->ppentry);
+ pfid->pentry->obj_ops->put_ref(pfid->pentry);
pfid->opens--;
}
At this point I can create files, but trying to read it back fails an assert check in mdcache:
ganesha.nfsd: /code/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1972: _mdcache_lru_unref: Assertion `!entry->fh_hk.inavl' failed.
Aborted (core dumped)
And I've given up here; this requires more than a few hours every couple of years; I don't have this kind of free time at the moment.
So unless someone else will actively maintain it, it's probably not worth fixing at this point...
I'm trying to test the Linux 9p filesystem, but when I try to read from a file, ganesha.nfsd dies with SEGV. Valgrind shows the problem:
==38960== Memcheck, a memory error detector ==38960== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==38960== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==38960== Command: ./ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -F ==38960== ==38960== Thread 138: ==38960== Invalid read of size 4 ==38960== at 0x4DC32D6: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.c:41) ==38960== by 0x489700C: sync_cb (fsal_helper.c:1837) ==38960== by 0x49D79DF: mdc_read_super_cb (mdcache_file.c:559) ==38960== by 0x49D7AC5: mdc_read_cb (mdcache_file.c:582) ==38960== by 0x7B4B81F: vfs_read2 (file.c:1317) ==38960== by 0x49D7BCF: mdcache_read2 (mdcache_file.c:617) ==38960== by 0x4897173: fsal_read (fsal_helper.c:1849) ==38960== by 0x4A10FD4: _9p_read (9p_read.c:134) ==38960== by 0x4A0A024: _9p_process_buffer (9p_interpreter.c:181) ==38960== by 0x4A09DCB: _9p_tcp_process_request (9p_interpreter.c:133) ==38960== by 0x48CE182: _9p_execute (9p_dispatcher.c:315) ==38960== by 0x48CE508: _9p_worker_run (9p_dispatcher.c:412) ==38960== Address 0x24 is not stack'd, malloc'd or (recently) free'd
This Ganesha was built from the nfs-ganesha-5.7 tarball and the two patches included in the Fedora 39 srpm. The configuration was:
cmake ~/nfs-ganesha-5.7/src/ -DUSE_SYSTEM_NTIRPC=ON -DUSE_FSAL_RGW=OFF
and the F39 system gcc-13.2.1 was used. In gdb, I see:
0 0x00007ffff797c2d6 in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libc.so.6
1 0x00007ffff7c4100d in sync_cb (obj=0x7fff38000ec8, ret=..., args=0x7fffd67e9ee0, caller_data=0x7fffd67e9f50) at /root/nfs-ganesha-5.7/src/FSAL/fsal_helper.c:1837
2 0x00007ffff7d819e0 in mdc_read_super_cb (obj=0x7fff38000cd0, ret=..., obj_data=0x7fffd67e9ee0, caller_data=0x7fff28000b70)
3 0x00007ffff7d81ac6 in mdc_read_cb (obj=0x7fff38000cd0, ret=..., obj_data=0x7fffd67e9ee0, caller_data=0x7fff28000b70)
4 0x00007ffff76f7820 in vfs_read2 () from /usr/lib64/ganesha/libfsalvfs.so
5 0x00007ffff7d81bd0 in mdcache_read2 (obj_hdl=0x7fff38000ec8, bypass=true, done_cb=0x7ffff7c40e94, read_arg=0x7fffd67e9ee0, caller_arg=0x7fffd67e9f50)
6 0x00007ffff7c41174 in fsal_read (obj_hdl=0x7fff38000ec8, bypass=true, arg=0x7fffd67e9ee0, data=0x7fffd67e9f50) at /root/nfs-ganesha-5.7/src/FSAL/fsal_helper.c:1849
7 0x00007ffff7dbafd5 in _9p_read (req9p=0x7fff44011580, plenout=0x7fffd67fb1b8, preply=0x7fffd67ea040 "") at /root/nfs-ganesha-5.7/src/Protocols/9P/9p_read.c:134
8 0x00007ffff7db4025 in _9p_process_buffer (req9p=0x7fff44011580, replydata=0x7fffd67ea040 "", poutlen=0x7fffd67fb1b8)
9 0x00007ffff7db3dcc in _9p_tcp_process_request (req9p=0x7fff44011580) at /root/nfs-ganesha-5.7/src/Protocols/9P/9p_interpreter.c:133
10 0x00007ffff7c78183 in _9p_execute (req9p=0x7fff44011580) at /root/nfs-ganesha-5.7/src/MainNFSD/9p_dispatcher.c:315
11 0x00007ffff7c78509 in _9p_worker_run (ctx=0x32db2c0) at /root/nfs-ganesha-5.7/src/MainNFSD/9p_dispatcher.c:412
12 0x00007ffff7cfb98e in fridgethr_start_routine (arg=0x32db2c0) at /root/nfs-ganesha-5.7/src/support/fridgethr.c:486
13 0x00007ffff797d897 in start_thread () from /lib64/libc.so.6
14 0x00007ffff7a046fc in clone3 () from /lib64/libc.so.6
with line 1837 of sync_cb() being:
1837 pthread_cond_signal(data->fsa_cond);
(gdb) p *data $2 = {ret = {major = ERR_FSAL_NO_ERROR, minor = 0}, done = true, fsa_mutex = 0x7fffd67fb320, fsa_cond = 0x0}
so data->fsa_cond is NULL.