[clang] Diagnose misuse of block variable thread_locals with dynamic init or destruction.

jyknight commented 4 hours ago

The C++ standard makes it unfortunately easy to misuse a block-scoped variable with thread-local storage duration, and, therefore, to inadvertently/confusingly end up using an uninitialized value.

The issues arise with the difference in initialization semantics between a global marked thread_local and a block-scoped variable marked thread_local. For a global variable marked thread_local, the initializer is run the first time the value is accessed on a given thread -- which means every access verifies whether initialization is required. In contrast, a function-local thread_local is initialized the first time control-flow passes through the declaration on a given thread: uses never trigger initialization.

Thus, whenever a function-local thread_local has dynamic initialization, or has a type with non-trivial destruction, we have some pretty nasty footguns. I've used an int in these examples, but of course the symptoms will be worse when missing the initialization of a non-trivial object.

1. Jumping over the initializer

While C++ prohibits jumping into the scope of an automatic variable [stmt.decl] p2, it does not prohibit jumping into the scope of a function-local thread_local (or static) object. The object's initializer is simply skipped.

E.g.:

int f();
int test1() {
  goto skip;
  thread_local int x = f();
  skip:
  return x;
}

I can't think of a valid reason why any code should ever do this. Ideally, the language would outright prohibit it -- just like it does for automatic variables. I'm not sure why it does not. Until then, we could emit a warning. (I think it makes sense to diagnose for both thread and static variables, here).

2. Accessing the value in a nested function/lambda.

E.g.

int f();
void g(int (*fn)());

int test2() {
  thread_local int x = f();
  g([] { return x; });
}

Note that the lambda does not capture the address of x -- it can access it directly, because it's not an automatic variable. Thus, the address of x that the lambda acceses will be the address for the thread the lambda is called on. However, because the access does not flow through the declaration of x, the value may be uninitialized.

This case is a bit trickier, because this code perfectly safe as long as g only calls the lambda on the same thread. Nevertheless, I believe that it would also be an improvement to prohibit this -- and, in absence of such a prohibition, we should warn about it. If you do legitimately want to use a thread_local in a lambda that you know will only be executed on the same thread, you could capture the address (same as you'd need to do for an automatic variable). But, it's probably better to move the block-scoped thread_local either to a new function, or to a global.

3. Accessing the value from a lambda, without ever executing the parent

This is similar to the above, but it's also unsafe even within a single thread (and thus, also can be triggered for a static storage duration declaration). Since the lambda function is part of the type, it can be constructed and called without ever executing the parent function -- and thus never pass through the initializer for x.

int f();

auto GetLambda() {
    thread_local int x = f();
    return [] { return x; };
}

int test3() {
  return decltype(GetLambda()){}();
}

For thread_local (but not static), the diagnostic for the previous case would also cover this. This code seems much less likely to happen accidentally than the other cases -- so, I think it's probably not worth caring about the issue for static storage duration variables.

4. Accessing the value after coroutine suspension.

int f();
std::generator<int> test4() {
  thread_local int x = f();
  while(true)
    co_yield x;
}

A coroutine may be resumed on a different thread at each resumption. Therefore, upon every access to x, we fetch the thread-local address anew (and, if it was a global, we'd also ensure the initializer had been called). Unfortunately, for a block-scoped thread_local, the initializer only runs on the thread where test started execution, and not where it's resumed. This feels...unexpected.

Potentially we could diagnose any use of a block-scope thread-local with dynamic initialization or destruction in a coroutine. Or else, diagnose anywhere which might have a yield between the declaration and the use.

llvmbot commented 4 hours ago

@llvm/issue-subscribers-c-1

Author: James Y Knight (jyknight)

The C++ standard makes it unfortunately easy to misuse a block-scoped variable with thread-local storage duration, and, therefore, to inadvertently/confusingly end up using an uninitialized value. The issues arise with the difference in initialization semantics between a global marked `thread_local` and a block-scoped variable marked `thread_local`. For a global variable marked `thread_local`, the initializer is run the first time the value is accessed on a given thread -- which means every access verifies whether initialization is required. In contrast, a function-local thread_local is initialized the first time control-flow passes through the _declaration_ on a given thread: uses _never_ trigger initialization. Thus, whenever a function-local thread_local has dynamic initialization, or has a type with non-trivial destruction, we have some pretty nasty footguns. I've used an `int` in these examples, but of course the symptoms will be worse when missing the initialization of a non-trivial object. ### 1. Jumping over the initializer While C++ prohibits jumping into the scope of an automatic variable [[stmt.decl] p2](https://eel.is/c++draft/stmt.stmt#stmt.dcl-2), it does _not_ prohibit jumping into the scope of a function-local thread_local (or static) object. The object's initializer is simply skipped. E.g.: ``` int f(); int test1() { goto skip; thread_local int x = f(); skip: return x; } ``` I can't think of a valid reason why any code should ever do this. Ideally, the language would outright prohibit it -- just like it does for automatic variables. I'm not sure why it does not. Until then, we could emit a warning. (I think it makes sense to diagnose for both thread and static variables, here). ### 2. Accessing the value in a nested function/lambda. E.g. ``` int f(); void g(int (*fn)()); int test2() { thread_local int x = f(); g([] { return x; }); } ``` Note that the lambda does not capture the address of `x` -- it can access it directly, because it's not an automatic variable. Thus, the address of `x` that the lambda acceses will be the address for the thread the lambda is called on. However, because the access does not flow through the declaration of `x`, the value may be uninitialized. This case is a bit trickier, because this code perfectly safe as long as `g` only calls the lambda on the same thread. Nevertheless, I believe that it would also be an improvement to prohibit this -- and, in absence of such a prohibition, we should warn about it. If you do legitimately want to use a thread_local in a lambda that you know will only be executed on the same thread, you could capture the address (same as you'd need to do for an automatic variable). But, it's probably better to move the block-scoped thread_local either to a new function, or to a global. ### 3. Accessing the value from a lambda, without ever executing the parent This is similar to the above, but it's also unsafe even within a single thread (and thus, also can be triggered for a static storage duration declaration). Since the lambda function is part of the type, it can be constructed and called without ever executing the parent function -- and thus never pass through the initializer for `x`. ``` int f(); auto GetLambda() { thread_local int x = f(); return [] { return x; }; } int test3() { return decltype(GetLambda()){}(); } ``` For thread_local (but not static), the diagnostic for the previous case would also cover this. This code seems much less likely to happen accidentally than the other cases -- so, I think it's probably not worth caring about the issue for static storage duration variables. ### 4. Accessing the value after coroutine suspension. ``` int f(); std::generator<int> test4() { thread_local int x = f(); while(true) co_yield x; } ``` A coroutine may be resumed on a different thread at each resumption. Therefore, upon every access to `x`, we fetch the thread-local address anew (and, if it was a global, we'd also ensure the initializer had been called). Unfortunately, for a block-scoped thread_local, the initializer only runs on the thread where `test` started execution, and not where it's resumed. This feels...unexpected. Potentially we could diagnose any use of a block-scope thread-local with dynamic initialization or destruction in a coroutine. Or else, diagnose anywhere which might have a yield between the declaration and the use.

llvmbot commented 4 hours ago

@llvm/issue-subscribers-clang-frontend

Author: James Y Knight (jyknight)

The C++ standard makes it unfortunately easy to misuse a block-scoped variable with thread-local storage duration, and, therefore, to inadvertently/confusingly end up using an uninitialized value. The issues arise with the difference in initialization semantics between a global marked `thread_local` and a block-scoped variable marked `thread_local`. For a global variable marked `thread_local`, the initializer is run the first time the value is accessed on a given thread -- which means every access verifies whether initialization is required. In contrast, a function-local thread_local is initialized the first time control-flow passes through the _declaration_ on a given thread: uses _never_ trigger initialization. Thus, whenever a function-local thread_local has dynamic initialization, or has a type with non-trivial destruction, we have some pretty nasty footguns. I've used an `int` in these examples, but of course the symptoms will be worse when missing the initialization of a non-trivial object. ### 1. Jumping over the initializer While C++ prohibits jumping into the scope of an automatic variable [[stmt.decl] p2](https://eel.is/c++draft/stmt.stmt#stmt.dcl-2), it does _not_ prohibit jumping into the scope of a function-local thread_local (or static) object. The object's initializer is simply skipped. E.g.: ``` int f(); int test1() { goto skip; thread_local int x = f(); skip: return x; } ``` I can't think of a valid reason why any code should ever do this. Ideally, the language would outright prohibit it -- just like it does for automatic variables. I'm not sure why it does not. Until then, we could emit a warning. (I think it makes sense to diagnose for both thread and static variables, here). ### 2. Accessing the value in a nested function/lambda. E.g. ``` int f(); void g(int (*fn)()); int test2() { thread_local int x = f(); g([] { return x; }); } ``` Note that the lambda does not capture the address of `x` -- it can access it directly, because it's not an automatic variable. Thus, the address of `x` that the lambda acceses will be the address for the thread the lambda is called on. However, because the access does not flow through the declaration of `x`, the value may be uninitialized. This case is a bit trickier, because this code perfectly safe as long as `g` only calls the lambda on the same thread. Nevertheless, I believe that it would also be an improvement to prohibit this -- and, in absence of such a prohibition, we should warn about it. If you do legitimately want to use a thread_local in a lambda that you know will only be executed on the same thread, you could capture the address (same as you'd need to do for an automatic variable). But, it's probably better to move the block-scoped thread_local either to a new function, or to a global. ### 3. Accessing the value from a lambda, without ever executing the parent This is similar to the above, but it's also unsafe even within a single thread (and thus, also can be triggered for a static storage duration declaration). Since the lambda function is part of the type, it can be constructed and called without ever executing the parent function -- and thus never pass through the initializer for `x`. ``` int f(); auto GetLambda() { thread_local int x = f(); return [] { return x; }; } int test3() { return decltype(GetLambda()){}(); } ``` For thread_local (but not static), the diagnostic for the previous case would also cover this. This code seems much less likely to happen accidentally than the other cases -- so, I think it's probably not worth caring about the issue for static storage duration variables. ### 4. Accessing the value after coroutine suspension. ``` int f(); std::generator<int> test4() { thread_local int x = f(); while(true) co_yield x; } ``` A coroutine may be resumed on a different thread at each resumption. Therefore, upon every access to `x`, we fetch the thread-local address anew (and, if it was a global, we'd also ensure the initializer had been called). Unfortunately, for a block-scoped thread_local, the initializer only runs on the thread where `test` started execution, and not where it's resumed. This feels...unexpected. Potentially we could diagnose any use of a block-scope thread-local with dynamic initialization or destruction in a coroutine. Or else, diagnose anywhere which might have a yield between the declaration and the use.

llvm / llvm-project