zalando / expan

Open-source Python library for statistical analysis of randomised control trials (A/B tests)
MIT License
331 stars 50 forks source link

Remove some logging. Perhaps make it optional? #235

Closed aaron-mcdaid-zalando closed 6 years ago

aaron-mcdaid-zalando commented 6 years ago

These messages seem to be the most repetitive. Also, they don't seem very useful to me

So this PR proposes to remove them. If they are valuable, we could consider making them optional

Also, even if somebody does want to record the number of NaNs, the current message is kinda of useless and is not concise. I would prefer to log a simple, concise, report of the number of NaNs is each column

aaron-mcdaid-zalando commented 6 years ago

In this PR, I've disabled the functionality to check if the variances are too different from each other. I'm not sure anybody uses this, and it creates a lot of log messages for some datasets

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.2%) to 92.249% when pulling 7f7388530fb9c557a025c9cbd8d3d0b6777db0d0 on less.logging into 20dc15ff1891fb22ee2142817ba5c6a415595469 on master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.2%) to 92.249% when pulling 7f7388530fb9c557a025c9cbd8d3d0b6777db0d0 on less.logging into 20dc15ff1891fb22ee2142817ba5c6a415595469 on master.

igusher commented 6 years ago

If somebody find these messages useful, keep in mind that there is an alternative: to change a log level for such messages to DEBUG or TRACE and set a limit to INFO in the host component (one which uses a lib)..I'm not completely sure how to do a later part in Python, but I believe there must be a way.

aaron-mcdaid-zalando commented 6 years ago

I have to ask a stupid question. I hear a lot about logging around Zalando, but very little clarity. (I guess I just need to ask much tougher questions!). Does scalyr care about DEBUG/INFO/TRACE? I mean, will we save money if the messages we send to scalyr have a different level?

My guess is that it won't save anything. My guess is that, instead, we need to stop sending info to scalyr in the first place?

... and set a limit to INFO in the host component ...

I guess this is the interesting bit. Are you saying that, in Python, we should configure the logger such that certain messages are simply not printed in the first place? Configure the logger object in Python to ignore all DEBUG messages for example?

igusher commented 6 years ago

@aaron-mcdaid-zalando scalyr doesn't care about level. neither scalyr client and logging system in AWS instances care. Decision is being made in application code which messages to log and which not, according to logging level configs. It is not solely about cost cut, but also about usability of those logs. I've sent you a PM how we do it.

aaron-mcdaid-zalando commented 6 years ago

Closing this for now, unless somebody else is still interested? Users of expan can use the following code to suppress most logging from expan:

    for logger_name_to_suppress in [
            'expan.core.early_stopping',
            'expan.core.experiment',
            'expan.core.statistics',
            ]:
        logging.getLogger(logger_name_to_suppress).setLevel(logging.ERROR)
         # Only 'ERROR' and 'CRITICAL' survive this
gbordyugov commented 6 years ago

oh, it hasn't been merged -- great!

aaron-mcdaid-zalando commented 6 years ago

Indeed. This is somewhat obsoleted by other logging changes we've made