google / glog

C++ implementation of the Google logging module
http://google.github.io/glog/
BSD 3-Clause "New" or "Revised" License
7.01k stars 2.05k forks source link

EnableLogCleaner() interface is not thread safe? #753

Open kimi20071025 opened 2 years ago

kimi20071025 commented 2 years ago

I read the functions of EnableLogCleanerDisableLogCleaner and LogCleaner::enabled().

I find the private member enabled_ of LogCleaner is not lock guard.

Once one thread call EnableLogCleaner,and other thread is logging, It may lead to undefined behaviour?

sergiud commented 2 years ago

Could you please point out the exact location where the problem occurs?

kimi20071025 commented 2 years ago

Could you please point out the exact location where the problem occurs?

When one thread is logging to file, it will call the function LogFileObject::Write(). The function LogFileObject::Write() will call log_cleaner.enabled(). The log_cleaner.enabled() visits the enabled_ without lock guarded.

meanwhile other thread try to change the enabled_ by call function EnableLogCleaner or DisableLogCleaner which will write the enabled_ without lock guarded.

so may be it lead to undefined behaviour? I am just in doubt of this. May be I am wrong.

Thanks for response.

sergiud commented 2 years ago

Is it possible that you mean a race condition and not undefined behavior? The former can occur because EnableLogCleaner and DisableLogCleaner are not atomic. However, I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

cc @aesophor

kimi20071025 commented 2 years ago

Is it possible that you mean a race condition and not undefined behavior? The former can occur because EnableLogCleaner and DisableLogCleaner are not atomic. However, I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

cc @aesophor

Yes. One thread is reading enabled_ while another is writing it. May the race condition leads to undefined behavior ?

It must not leads to undefined behavior because enabled_ is bool ?

sergiud commented 2 years ago

I'm not sure what are you asking. Just because a boolean is not atomic it will not immediately cause undefined behavior. Also, undefined behavior is generally not prescribed by a library but by the language standard.

kimi20071025 commented 2 years ago

I'm not sure what are you asking. Just because a boolean is not atomic it will not immediately cause undefined behavior. Also, undefined behavior is generally not prescribed by a library but by the language standard.

OK. If the enabled_ is uint8_t, the race condition probably cause undefined behavior?

aesophor commented 2 years ago

Quoting from @sergiud:

I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

@kimi20071025 We understand the possible race condition you've mentioned. Are you working on something that can trigger this race condition and is intolerable?

IMO, if we want to eliminate race condition completely, just making enabled_ an std::atomic is probably not enough: Imaging two threads calling LogCleaner::Enable(int days), the value of LogCleaner::overdue_days_ at the end can be indeterministic. However, if we use a mutex to protect the entire body of both LogCleaner::Enable() and LogCleaner::enabled(), then they will bring overhead for those who use this API with a single thread.

In this case, is there a better option other than a mutex?

kimi20071025 commented 2 years ago

Quoting from @sergiud:

I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

@kimi20071025 We understand the possible race condition you've mentioned. Are you working on something that can trigger this race condition and is intolerable?

IMO, if we want to eliminate race condition completely, just making enabled_ an std::atomic is probably not enough: Imaging two threads calling LogCleaner::Enable(int days), the value of LogCleaner::overdue_days_ at the end can be indeterministic. However, if we use a mutex to protect the entire body of both LogCleaner::Enable() and LogCleaner::enabled(), then they will bring overhead for those who use this API with a single thread.

In this case, is there a better option other than a mutex?

What I am worried about is not race condition but data race.

Glog library try to clean overdue logs every time when logging. I think it may cost much performance If LogCleaner is opened all the time. So Once a day, I enable log cleaner (EnableLogCleaner) at night , do an error logging and close log cleaner (DisableLogCleaner).

When a thread is logging, it will finally call LogFileObject::Write, the function LogFileObject::Write calls log_cleaner.enabled().
which means the thread reads enabled_.

Meanwhile, other thread calls EnableLogCleaneror DisableLogCleaner, the thread writes enabled_.

So what I worried is the condition may cause data race .

sergiud commented 2 years ago

I had to actually look this up. But it seems that concurrent reading/writing of/to a variable is indeed undefined behavior in C++.

However, at this point I would rather claim that your use case is not supported in the sense that you should not be calling EnableLogCleaner and DisableLogCleaner from different threads. As @aesophor already mentioned, a mutex is necessary to solve the data race which will however incur an overhead at little or no gain. We can certainly reconsider once we have the means for proper synchronization (i.e., C++11 <thread> functionality). But at this point, the effort making these functions thread safe is not worth it.

I think it may cost much performance

How did you determine the performance penalty, or is this only a guess? If it's the latter, you should measure quantitatively first and let us know how strong the performance degradation is if the cleaner has been enabled. This is something worth improving.

kimi20071025 commented 2 years ago

About the log cleaner performance, I shared the same opnion with issue #702 . Much performance cost when enable log clean.

And I also read the benchmark #761 , could @aesophor help to make other benchmark comparing the logging performance between enable log clean with disable log clean?

If there is much performance penalty comparing by disable log clean. It means the LogCleaner can not be enabled all the time.

aesophor commented 2 years ago

Platform:

  • Arch Linux
  • CPU: Intel i7-10700 CPU @ 2.90GH (16-core)
  • Memory: 16 GB ...

When there are 100 files in log directories, the execution time of LogCleaner::Run() is 0.157 ms in avg.

Upon each call to LogFileObject::write(), if the log cleaner is enabled and if there are 100 files to check, it only takes extra 0.157 ms to run.

sergiud commented 2 years ago

About the log cleaner performance, I shared the same opnion with issue #702 . Much performance cost when enable log clean.

What's exactly "much performance" in absolute numbers? Issue #702 did not provide any details either. Therefore I don't see any factual arguments.

kimi20071025 commented 2 years ago

Platform

Test code

#include "glog/logging.h"
#include <unistd.h>
#include <chrono>
#include <iostream>

using namespace std;
using namespace chrono;

int main(int argc, char *argv[])
{
    google::InitGoogleLogging("");    
    FLAGS_logbufsecs = 0;
    FLAGS_max_log_size = 100;

    google::SetLogDestination(google::GLOG_ERROR, "/tmp/glog/ERROR_");
    google::SetLogDestination(google::GLOG_WARNING, "");
    google::SetLogDestination(google::GLOG_INFO, "");

    FLAGS_colorlogtostderr = false;
    FLAGS_alsologtostderr = false;
    google::SetStderrLogging(google::GLOG_FATAL);

    google::EnableLogCleaner(1);

    auto enableLogCleanStart = system_clock::now();

    for (uint16_t i = 0; i < 50000; i++)
    {
        LOG(ERROR)<<"glog error";
    }

    auto enableLogCleanEnd = system_clock::now();

    std::chrono::duration<double, std::milli> enableLogCleanDuration = enableLogCleanEnd - enableLogCleanStart;

    cout<<"cost: "<<enableLogCleanDuration.count()<<" ms when enable  log clean"<<endl;

    google::DisableLogCleaner();

    auto disableLogCleanStart = system_clock::now();

    for (uint16_t i = 0; i < 50000; i++)
    {
        LOG(ERROR)<<"glog error";
    }

    auto disableLogCleanEnd = system_clock::now();

    std::chrono::duration<double, std::milli> disableLogCleanDuration = disableLogCleanEnd - disableLogCleanStart;

    cout<<"cost: "<<disableLogCleanDuration.count()<<" ms when disable log clean"<<endl;

    return 0;
}

Result

cost: 518.541 ms when enable  log clean
cost: 225.845 ms when disable log clean

conclusion

When there are 1 file in log directories and log cleaner is opened, the log performance will decrease 230% comparing with log cleaner is closed.

aesophor commented 2 years ago

I believe that the overhead (518.541 ms) is because:

If your application needs to write so frequently but it's performance critical at the same time, then you can use another thread to occasionally perform cleanup. You are also welcome to submit a PR and improve it.

aesophor commented 2 years ago

A possible optimization strategy: use CycleClock_Now() to maintain an additional timer such as next_cleanup_time_.

kimi20071025 commented 2 years ago

you can use another thread to occasionally perform cleanup

This is what I already do, But the date race of enabled_ should be avoid as I said at the begin of this issue.