Lock to a thread on all API calls.

hallyn commented 9 years ago

Lxc now uses a per-thread (TLS) variable to store the lxc_conf of the container being acted upon. This appears to get messed up, presumably when golang moves goroutines between threads.

When running lxd testsuites with the current git HEAD of both lxc and lxd, I get frequent failures starting concurrent containers. With this patch, I don't get those at all.

I'm posting this patch mainly for discussion - it appears to work for me, but that doesn't mean it's right :)

Signed-off-by: Serge Hallyn serge.hallyn@ubuntu.com

niemeyer commented 9 years ago

Hi Serge,

Can you provide more details about how the TLS is being used, and what constraints the C side expects to be respected?

The patch appears to (at least in many cases I've seen) lock the current goroutine to the currently running thread for the duration of the C call. I cannot see yet how this is helping, given that Go must necessarily do that for the C layer to work at all. In other words, Go will not move the goroutine while it is executing a C call. The locking functions are mainly useful when there's a "golden" thread somehow (e.g. the "GUI thread") which once initialized with data in its TLS, must have all calls done with it. This means we cannot call unlock between the C calls, otherwise the next lock may be happening on an unrelated thread.

As a side note, can we change LXC so that it does not use TLS in the first place? Using such APIs from languages that explore concurrency properly tends to be very cumbersome.

hallyn commented 9 years ago

The lxc commit is https://github.com/lxc/lxc/commit/858377e4d968c8a7254e22dc7167acf76ac91a48

Each container can specify its own logfile for info/debug/error messages. Because of the way lxc is historically structured, passing this logfile to every place that uses ERROR/INFO/DEBUG would be problematic.

So instead, on the start of every lxc API call, we set a per-thread 'current_conf' variable which allows us to find the container's logfile (if any) from the environment at any point - basically like 'current' in the kernel.

caglar10ur commented 9 years ago

@niemeyer is right go runtime won't move the call while its in the C land but I think (correct me if I'm wrong) things still could get ugly if the goroutine gets moved while LXD is doing something in the Go world.

One way to do make sure that this won't happen might be calling LockOSThread in init and asking go runtime to make goroutine - os thread relationship 1:1. Alternatively moving away from TLS variables and storing them in thread safe structures in LXC could work.

niemeyer commented 9 years ago

There's no way to ask the Go runtime to keep the 1:1 relationship. Locking the goroutine that has run the initialization to the current thread works, but it means every follow up call must necessarily run from the same goroutine as well, which hinders first-class concurrency badly, and must be kept in mind at all times.

I would indeed try to avoid the TLS, and instead ensure thread-safety.

caglar10ur commented 9 years ago

You mean every follow up call coming from the same goroutine though, right? Asking to learn, if go-lxc calls LockOSThread in its init() and if LXD calls let's say NewContainer N times from N different goroutines then go runtime will end up creating N OS thread and putting those N goroutine on top of them and won't move them away until they call UnlockOSThread or exit. Have I got that right?

niemeyer commented 9 years ago

No, I mean that every follow up call that cares about the thread it is in must be in the same goroutine as the LockOSThread call was made in, otherwise it will land in a different and arbitrary thread.

As for the calls to NewContainer, no, it doesn't work that way. If you call LockOSThread, it will lock the current goroutine to the current thread. If you then call NewContainer from a different goroutine, that's unaffected by the initial call to LockOSThread inside init().

caglar10ur commented 9 years ago

Right, I meant the same goroutine as the LockOSThread call was made by saying "same goroutine" :)

Ah of course, init is called once so the other calls will be unaffected as you describe. Thanks for the explanation Gustavo!

hallyn commented 9 years ago

I see, I'd misunderstood @niemeyer's original comment - so you are saying that go already, definately, always locks a thread to CPU when calling out to C. That's what I would have originally expected.

The TLS variable only needs to be maintained during the C call, so that means that this patch should make no difference.

However, it absolutely makes the difference between failure and success of lxd's concurrency test. The question, then, is whether (a) go might be wrongly moving a goroutine anyway, or (b) otherwise, why this patch is making a diffference.

niemeyer commented 9 years ago

This changes the concurrency behavior, so it may simply be hiding something else. I would try to remove the calls until it explodes, to see where exactly is the tweak that makes a difference.

hallyn commented 9 years ago

Thanks - I'll close this request for now.

lxc / go-lxc

Lock to a thread on all API calls. #41