habanero-rice / hclib

A C/C++ task-based programming model for shared memory and distributed parallel computing.
http://habanero-rice.github.io/hclib/
BSD 3-Clause "New" or "Revised" License
71 stars 35 forks source link

Use __thread for thread-local data rather than pthread_get/setspecific #70

Open srirajpaul opened 5 years ago

srirajpaul commented 5 years ago

pthread_get/getspecific is used to save and retrieve the state of the worker. We can use thread storage class specifier to do the same which is supported by all the major compilers. Also, thread is much more efficiently implemented by the compiler.

agrippa commented 5 years ago

@srirajpaul some preliminary bullet points on this:

So, we'd need to choose between relying on __thread and possibly losing some compiler flexibility, or going with the C11 standard and possibly losing some older compilers.

srirajpaul commented 5 years ago

@agrippa C11 support is not needed today, but the Makefile adds the c11 flag while compiling HClib. Therefore using _Thread_local storage class would not cause any disruption for whoever is using HClib. Since __thread is not a part of C standard, we can ignore it.

agrippa commented 5 years ago

Good point about c11, I hadn't realized that. I've added some text to the README explicitly calling std=c11 out as a requirement, and submitted a pull request. Could you take a quick look? It's a trivial change.

So... are you volunteering to make this change? :)

srirajpaul commented 5 years ago

I will make this change.

sbak5 commented 5 years ago

boost context routines don't guarantee the correctness of thread local variables. They mentioned in their document about this issue. When I tested for my previous work using boost context, thread local is OK in Linux but not on MacOS. As we can see, context switching routines are written in assembly for each OS and architecture. We can choose to use thread local for x86 / linux and pthread routines for others.

vsarkar commented 5 years ago

Seonmyeong,

You make a good point. On a related note, I think it's fine if we limit the number of platforms supported by HClib to a smaller number that we are confident of supporting well. While MacOS support is a convenience, it need not be a requirement. This issue has also come up when trying to make HClib builds cmake-compatible.

Best,

Vivek

Vivek Sarkar, http://vsarkar.cc.gatech.edu Professor, School of Computer Science, Klaus 2332, +1-‭404-385-5989‬ Stephen Fleming Chair for Telecommunications, College of Computing, Georgia Institute of Technology Co-Director, Center for Research into Novel Computing Hierarchies (CRNCH) Assistant: Kenya Payton, kenya.payton@cc.gatech.edumailto:kenya.payton@cc.gatech.edu, Klaus 2115, +1-404-385-6440‬

On Jun 5, 2019, at 1:23 AM, Seonmyeong Bak notifications@github.com<mailto:notifications@github.com> wrote:

boost context routines don't guarantee the correctness of thread local variables. They mentioned in their document about this issue. When I tested for my previous work using boost context, thread local is OK in Linux but not on MacOS. As we can see, context switching routines are written in assembly for each OS and architecture. We can choose to use thread local for x86 / linux and pthread routines for others.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/habanero-rice/hclib/issues/70?email_source=notifications&email_token=AAP3LYFSVN2OZPV72LQUIATPY5EUZA5CNFSM4GWN7UDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW6UHKQ#issuecomment-498942890, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAP3LYGQSDWANIZVI7JDBH3PY5EUZANCNFSM4GWN7UDA.

srirajpaul commented 5 years ago

@sbak5 That is a very good point. We will be careful during our implementation to make sure thread local variables/pthread thread specific data not lost during context switch/and back. Currently we save the required data from thread specific storage before boost context switch and restore them when it switches back. So if the concern with boost context is that the continuation might run on a different thread/worker and hence data stored in thread_local is not valid, that problem won't happen. Still we will keep both thread_local and pthread apis and allow to switch between them using preprocessor macros.

sbak5 commented 5 years ago

@srirajpaul Currently, I think HClib only supports task-level parallelism, where all the closures are executed once as OpenMP tasks. So, it's fine currently because the tasks doesn't keep its status and context when they're switched back. When we extend HClib to SPMD and user-level threads which have stack and do context switching as in pthread, this can be an issue. When M:N mapping is done (mapping ULT onto kernel level thread workers), User-level threads behave as kernel level threads while it is running on the certain kernel level thread. When they're switched back, all the values in the thread local variables should be also switched back.
I don't know the exact reason but when ULTs using boost context are switched onto kernel level threads, thread local variable values are not maintained even when they're managed explicitly. (weird garbage value is stored after some number of context switching happens.)

@vsarkar I'm not sure if it works on IBM Power with thread local variables. I guess it's better to use some wrappers where either pthread_set/getspecific and thread local variables are used depending on the underlying OS and architecture.