lpereira / lwan

Experimental, scalable, high performance HTTP server
https://lwan.ws
GNU General Public License v2.0
5.92k stars 549 forks source link

Several potential data races in lwan. #205

Open ITWOI opened 7 years ago

ITWOI commented 7 years ago

Hi all,

Our bug scanner has reported several data race issues. One is at lwan-status.c#L52 and lwan-status.c#L220

Followings are code snippets.

    void                                                    \
    lwan_status_##fn_name_##_debug(const char *file,        \
        const int line, const char *func,                   \
        const char *fmt, ...)                               \
    {                                                       \
      if (!quiet) {                                         \
         va_list values;                                    \
         va_start(values, fmt); 

and

void
lwan_status_init(struct lwan *l)
{
    #ifdef NDEBUG
      quiet = l->config.quiet;
    #else
      quiet = false;
      (void) l;

The possible call trace is as follows:

lwan_init_with_config->lwan_job_thread_init()->pthread_create(&self, NULL, job_thread, NULL)->lwan_status_critical("Could not lock job wait mutex");

and

lwan_init_with_config->lwan_status_init(l);

Additionally, the global variable use_colors is used at four positions: status_out_msg, get_color_start_for_type, get_color_end_for_type and lwan_status_init. The last access is write, which may result in three data races whose traces are similar to the memtioned possible call trace.

We also found a benign data race at lwan_job_thread_shutdown and job_thread, which may be intentional.

SourceBrella Inc., Yu

lpereira commented 7 years ago

Thanks for the report! I fail to see, however, where there's a race condition in the quiet variable; lwan_job_thread_init() will only be called after lwan_status_init() is called. As far as the use of use_colors, lwan_status_init() is called first before any of those functions are called, so there's no harm. In any case, both quiet and use_colors are static variables, so they're initialized to 0 on program start, which shouldn't cause any unexpected behavior.

What's the data race in the lwan_job_thread_shutdown() function? Is it the reading variable? If so, it should be fine, it's protected by a mutex. Or is there something wrong there that I'm missing?

ITWOI commented 7 years ago

Thank you for your reply.

I'm afraid there is no high-level race condition but only low-level data race in the report.

I update the call trace as follows:

lwan_init_with_config->lwan_job_thread_init()->pthread_create(&self, NULL, job_thread, NULL)->lwan_status_critical("Could not lock job wait mutex");

and

lwan_init_with_config->lwan_status_init(l);

There are two calls to lwan_job_thread_init at lwan.c#L564 and lwan.c#L547. Note that the call to lwan_job_thread_init() is at lwan.c#L552. Therefore, I think it's possible for them to happen in parallel.

The benign data race at lwan_job_thread_shutdown and job_thread is protected by queue_mutex and job_wait_mutex, which is not protected by the same mutex. queue_mutex is used at lwan-job.c#L117 and lwan-job.c#L74, it seems running is not at the critical section constructed by queue_mutex.