omar-polo / gmid

a Gemini server
https://gmid.omarpolo.com
ISC License
102 stars 7 forks source link

debugging sandbox issue on arm64 #16

Closed nikonakoneko closed 1 year ago

nikonakoneko commented 2 years ago

Hello there,

I'm trying to get gmid to work with sandbox enabled on a raspberry pi 4 with gentoo linux. I uncommented the #define SC_DEBUG line as you tell in the faq, but I don't see any "unexpected syscall" line even if all tests are failing. What could it be?

Without sandboxing it works fine.

test output:

make 'TESTS=' -C regress all
make[1]: Entering directory '/home/user/gmid/regress'
./regress 
OK:  foo => foo
OK:  h.n => h.n
OK:  xn-invalid => xn-invalid
OK:  naïve => naïve
OK:  xn--8ca => è
OK:  xn--caff-8oa => caffè
OK:  xn--nave-6pa => naïve
OK:  xn--e-0mbbc => τeστ
OK:  xn--8ca67lbac => τèστ
OK:  xn--28j2a3ar1p => こんにちは
OK:  xn--hello--ur7iy09x => hello-世界
OK:  xn--hi--hi-rr7iy09x => hi-世界-hi
OK:  xn--caf-8la.foo.org => cafè.foo.org
OK:  xn--j6h => ♨
OK:  xn--x73l => 𩸽
OK:  xn--x73laaa => 𩸽𩸽𩸽𩸽
test_punycode passed
=> http://omarpolo.com
=> omarpolo.com
=> gemini:/omarpolo.com
=> gemini//omarpolo.com
=> h!!p://omarpolo.com
=> GEMINI://omarpolo.com
=> gemini://omarpolo.com
=> gemini://omarpolo.com/
=> gemini://omarpolo.com:1965
=> gemini://omarpolo.com:1965/
=> gemini://omarpolo.com:196s
=> gemini://OmArPoLo.CoM
=> gemini://xn--nave-6pa.omarpolo.com
=> gemini://naïve.omarpolo.com
=> gemini://na%c3%afve.omarpolo.com
=> gemini://omarpolo.com/foo/bar/baz
=> gemini://omarpolo.com/foo//bar///baz
=> gemini://omarpolo.com/foo/./bar/./././baz
=> gemini://omarpolo.com/foo/bar/../bar/baz
=> gemini://omarpolo.com/foo/../foo/bar/../bar/baz/../baz
=> gemini://omarpolo.com/foo/..
=> gemini://omarpolo.com/foo/../
=> gemini://omarpolo.com/foo/../..
=> gemini://omarpolo.com/foo/../../
=> gemini://omarpolo.com/foo/../foo/../././/bar/baz/.././.././/
=> gemini://omarpolo.com//foo
=> gemini://omarpolo.com/////foo
=> http://a/b/c/../..
=> gemini://example.com/@f:b!(z$&)/baz
=> foo://example.com/foo/?gne
=> foo://example.com/foo/?gne&foo
=> foo://ex.com/robots.txt?name=foobar&url=https://foo.com
=> foo://ex.com/foo?email=foo@bar.com#quuz
=> foo://bar.co/#foo
=> foo://bar.com/caf%C3%A8.gmi
=> foo://bar.com/caff%C3%A8%20macchiato.gmi
=> foo://bar.com/caff%C3%A8+macchiato.gmi
=> foo://bar.com/foo%2F..%2F..
=> foo://bar.com/foo%00?baz
=> foo://bar.com/cafè.gmi
=> foo://bar.com/世界.gmi
=> foo://bar.com/😼.gmi
=> foo://bar.com/😼/𤭢.gmi
=> foo://bar.com/世界/��
test_iri passed
gg: connect: can't connect to localhost:10965: Connection refused
gg: connect: can't connect to localhost:10965: Connection refused
./tests.sh: line 19: kill: (11503) - No such process
Header mismatch
wants : 20 text/gemini
got   : 
test_configless_mode failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_static_files failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 30 /dir/
got   : 
test_directory_redirect failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 application/octet-stream
got   : 
test_serve_big_files failed
gg: timer expired
Header mismatch
wants : 20 application/octet-stream
got   : 
test_dont_execute_scripts failed
gg: timer expired
Header mismatch
wants : 20 text/x-funny
got   : 
test_custom_mime failed
gg: timer expired
Header mismatch
wants : 20 application/x-foo
got   : 
test_default_type failed
gg: timer expired
Header mismatch
wants : 20 text/gemini;lang=it
got   : 
test_custom_lang failed
test_parse_custom_lang_per_location passed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_cgi_scripts failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 application/octet-stream
got   : 
test_cgi_big_replies failed
gg: timer expired
Unexpected number of args
want : 1
got  : 0
test_cgi_split_query failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_custom_index failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/plain
got   : 
test_custom_index_default_type_per_location failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_auto_index failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 40 temporary failure
got   : 
test_block failed
gg: timer expired
Header mismatch
wants : 40 % /foo.gmi  10965 localhost test
got   : 
test_block_return_fmt failed
gg: timer expired
Header mismatch
wants : 20 text/plain; lang=en
got   : 
test_entrypoint failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 60 client certificate required
got   : 
test_require_client_ca failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 51 not found
got   : 
test_root_inside_location failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 31 /foo/
got   : 
test_root_inside_location_with_redirect failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_fastcgi failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_macro_expansion failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 51 not found
got   : 
test_174_bugfix failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_proxy_relay_to failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_proxy_with_certs failed
gg: timer expired
gg: timer expired
Header mismatch
wants : 59 Wrong/malformed host or missing SNI
got   : 
test_unknown_host failed
gg: timer expired
Header mismatch
wants : 20 text/gemini
got   : 
test_include_mime failed

failed tests: test_configless_mode test_static_files test_directory_redirect test_serve_big_files test_dont_execute_scripts test_custom_mime test_default_type test_custom_lang test_cgi_scripts test_cgi_big_replies test_cgi_split_query test_custom_index test_custom_index_default_type_per_location test_auto_index test_block test_block_return_fmt test_entrypoint test_require_client_ca test_root_inside_location test_root_inside_location_with_redirect test_fastcgi test_macro_expansion test_174_bugfix test_proxy_relay_to test_proxy_with_certs test_unknown_host test_include_mime
make[1]: *** [Makefile:12: all] Error 1
make[1]: Leaving directory '/home/user/gmid/regress'
make: *** [Makefile:165: regress] Error 2
omar-polo commented 2 years ago

Hello,

Can you please make sure gmid was actually rebuild after changing sandbox.c? Just to be sure, please do a

$ make clean
$ make
$ make TESTS=test_static_files regress

after applying the following patch:

diff /tmp/gmid
commit - 62a46b03c6f911f3674d6cb7b77a49bac8efad42
path + /tmp/gmid
blob - d22126081dad10afde8ce6fc233fa7f20a4602ec
file + sandbox.c
--- sandbox.c
+++ sandbox.c
@@ -90,7 +90,7 @@ sandbox_logger_process(void)
 #endif

 /* uncomment to enable debugging.  ONLY FOR DEVELOPMENT */
-/* #define SC_DEBUG */
+#define SC_DEBUG

 #ifdef SC_DEBUG
 # define SC_FAIL SECCOMP_RET_TRAP

Does gmid works at runtime? If not, can you try to attach a debugger to the server process and see what happens when you make a request (for example with gg)?

Are you running the regress tests from portage? I remember that there was some issue with the sandbox used by the gentoo package building infrastructure (can't be more precise, sorry, i don't have much experience with gentoo) that broke the tests. (That's why @CyberTailor added a knob to skip runtime tests.)

Also, please make sure to build gmid from the latest release, i.e. 1.8.4. The master branch contains some relatively big changes for the upcoming 2.0 release.

Thanks!

nikonakoneko commented 2 years ago

Yes, I double checked that SC_DEBUG is defined and I'm indeed using 1.8.4 tag. I'll try ASAP with a debugger and tell you what I find.

omar-polo commented 2 years ago

Oh, another thing you can do is to use strace on the server process. I remember trying to debug a seccomp issue last year and spent a lot of time trying to understand what was going on, strace instead was really helpful in finding the issue (it was rt_sigreturn.)

So, you can start the server (gmid -fv or something like that), then (as root) do strace -p $pid (just do a pgrep -lf gmid and take the highest pid number). You should see something like

# strace -p XYZ
strace: Process XYZ attached
epoll_pwait(6,

Then do a request an see what happens. If it's a seccomp issue it should print (along other things) +++ killed by SIGSYS +++. Please attach the whole strace.

Thanks,

nikonakoneko commented 2 years ago

Then it must be something else. When I strace the child process I have no output at all when doing a request. Plus, if I try to start gmid without any config it exits directly with a return code of 1. Again, without the sandbox the issue doesn't happen.

So I guess I'll have to use the debugger. Any advice where to set break points or what to look for?

nikonakoneko commented 2 years ago

oh and I forgot to answer. I'm doing the test build without portage, all manually.

omar-polo commented 2 years ago

Well, it's starting to get interesting!

First of, make sure you're building with debug symbols. Run CFLAGS='-O0 -g' ./configure or edit Makefile.local and change the -O2 to -O0 -g. -O0 is needed so it's easier to debug and -g includes the debug symbols. do a make clean afterwards.

The seccomp filter is only used in the server process, but if we're failing that early is not straightforward to debug. What you can do is to apply a diff like this:

diff /tmp/gmid
commit - b48eb0db52b88c5da0d0096c25dfd4ec823e3f4b
path + /tmp/gmid
blob - 2258e67a25485efa2c76dee0fd7d91da4546d78c
file + server.c
--- server.c
+++ server.c
@@ -1559,6 +1559,13 @@ loop(struct tls *ctx_, int sock4, int sock6, struct im
 void
 loop(struct tls *ctx_, int sock4, int sock6, struct imsgbuf *ibuf)
 {
+   static int attached = 0;
+
+   fprintf(stderr, "my pid is %d\n", getpid());
+   while (!attached) {
+       sleep(1);
+   }
+
    ctx = ctx_;

    SPLAY_INIT(&clients);

Then rebuild and start gmid as ./gmid .. It will print "my pid is XYZ" and hang there. Then you should attach a debugger (e.g. gdb) to that process, set attached to 1 and then letting the daemon run. If it's seccomp you should be able to at least get a stacktrace from the debugger.

$ gdb ./gmid $pid
(gdb) frame 3
(gdb) set var attached = 1
(gdb) continue

I'm suspecting we're missing some epoll syscall or some accept(2) variants and that's killing us.

What libc are you using? I do have a raspberry pi 3 so I may try to reproduce it too.

Thanks!

omar-polo commented 1 year ago

`I think an update on this is needed. I decided to drop seccomp support completely, see 0b62f4842d7c65b8f64c5f676a0a05333fd7db6f, for the time being at least.

<rant>

Seccomp (and to some degree landlock) are so ill designed that it's a nightmare for a small project to maintain. Most of the contributions (and private mails) I got for gmid were due to capsicum breaking on some libc on some architecture. It's tedious, and not due to my dear users, but due the fact that is incredibly cumbersome to maintain a seccomp policy. Landlock is saner, but due to the linux ecosystem is hard to use as well. For example, do DNS queries works under landlock, with glibc doing its things under the hood with dlopen(2)? I think not. (but I'd be happy to be proven wrong!)

(capsicum is better, but due to the feature set I wanted for gmid it's not compatible.)

In comparison, take pledge and unveil. How much breakage and maintenance burden they added to the project? Zero. How many mails I got from users experiencing random issues due to them? Zero as well.

To move forward with a better privsep I dropped the sandboxing (OpenBSD being the exception). capsicum could be added to some processes now, but since it can't be used on the server processes (due to fastcgi and reverse proxying) i don't see how much value it would add. Landlock could also be added, provided it doesn't break the feature set.

</rant>

So, in the light of all of this, I'll close the issue. gmid 2.0 won't have issues with the sandbox on linux (since there is none). no ETA yet. sorry for the rant.