redneb / hs-linux-namespaces

haskell library to work with linux namespaces
https://hackage.haskell.org/package/linux-namespaces
BSD 3-Clause "New" or "Revised" License
4 stars 2 forks source link

RTS ticker thread can cause trouble #3

Open sorki opened 10 months ago

sorki commented 10 months ago

Recently I found that a testsuite using this package started failing with unshare: invalid argument but I wasn't sure what was going on as command line unshare worked just fine. Comparing both calls I didn't see much difference but then stumbled on clone3 call done by GHC forking a process called ghc_ticker. The use of ticker seems to depend on compile time options and availability of packages during GHC build.

Some more info https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/signals#the-rts-timer-signal

This started manifesting in CI which used latest Ubuntu and on NixOS as well.

The fix is to disable the timer with

ghc-options:       -rtsopts "-with-rtsopts -V0"

From help:

hnix-store-remote-tests:   -V<secs>  Master tick interval in seconds (0 == disable timer).
hnix-store-remote-tests:             This sets the resolution for -C and the heap profile timer -i,
hnix-store-remote-tests:             and is the frequency of time profile samples.
hnix-store-remote-tests:             Default: 0.01 sec.

Should we add this to the comments (or README) that already mention issues with -threaded?

During the debugging I've also extracted the example to separate cabal executable - want a PR? I can also PR a simple testsuite + CI if you want.

redneb commented 9 months ago

Thanks for this report and apologies for the late reply.

I have never run into this problem. Maybe it is caused in the scenario when the ticker thread uses signals? I thought that this only happens on old kernels. I would be happy to add a note to the documentation and the place where -threaded is mentioned is probably the best place. But I would be reluctant to suggest disabling the ticker as the default recommended practice, as this can have undesired consequences.

So a PR would be welcome.

sorki commented 9 months ago

I have never run into this problem. Maybe it is caused in the scenario when the ticker thread uses signals? I thought that this only happens on old kernels.

I think it is the opposite - the signals are fine but less efficient so newer configurations of GHC spawn a separate thread for the ticker. I wish I've had an exact commit where this changed but the test+CI where this started failing wasn't rolling with nixpkgs versions so I would have to bisect a lot and I guess I would arrive at some GHC bump.

It also depends on the build environment/configuration so the some GHCs might exhibit this depending on the distro while the exact same version might not elsewhere.

I'm mostly using this package for testsuites (in hnix-store-remote to trick nix-daemon to think it's running as root and before I've used it in rtnetlink-hs so its testsuite doesn't need to run as root).

Btw if you have a more elaborate example of unshare and then a subprocess using unshare with multiple mappings that would help me a lot improving the test environment for nix-daemon as it spawns more processes that are not happy with just a single mapping.