matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.57k stars 2.61k forks source link

Do not log SQL statements by default #7640

Closed tsteur closed 9 years ago

tsteur commented 9 years ago

Eg in Tracker mode 14% of my wall time is spent in Db::logSql() even though it doesn't actually log anything because of the log level. Also when archiving and thousands of queries are triggered it spends quite a lot of time there in total.

We discussed this about 2 weeks ago in the chat and an idea is to use introduce a new config setting [Debug] log_sql_queries.

Maybe in tracker we should always use the NullLogger and no handlers or so for best performance unless [Tracker] debug = 1 is enabled? Not sure!

diosmosis commented 9 years ago

Do you remember how many times the method was called, total wall time and whether any specific methods called by Db::logSql took up most of the time? Also, isn't the tracker Db different from \Piwik\Db? I guess some parts call Db::, but I was under assumption most tracker code didn't.

tsteur commented 9 years ago

Another run shows 120ms total run. There's a method Piwik\Tracker\Model::getIdsAction that does a fetchAll that triggered the logSql. It still uses some part of Db:: as tracker Db extends this one. Anyway, even in normal UI or when archiving it makes things slower.

diosmosis commented 9 years ago

120ms out of what total? And that's just for logSql? Did any functions called somewhere within logSql's execution take a lot of time or is it just the if statement in Logger?

I don't really care if logSql is run or not (it was a PRO developer request from the last meetup), but it seems strange that what should essentially be just if (!shouldLog($level)) { return; } results in noticeably slower execution. If there is an issue here, it bears investigation.

Did you test w/o xhprof/xdebug while doing a large bulk track?

tsteur commented 9 years ago

It takes about 14% of 120ms total time.

I tested with xhprof but without xdebug, no bulk. I do not have the run here anymore to see what exactly was slow.

diosmosis commented 9 years ago

For posterity/future work: Tracker\Db.php doesn't inherit from Piwik\Db.php, it's possible something's using Piwik\Db in the tracker incorrectly.

diosmosis commented 9 years ago

I did a bulk tracking test (300 actions / 100 visits), and Db::logSql is only called 20 times and took .07% of wall time. It's only called when autoloading Option values in Cache::getCacheGeneral. Making sure it uses the correct Db will fix this.

tsteur commented 9 years ago

Why bulk tracking? Can you try with a normal tracking request? And xdebug disabled + development mode enabled and caches warmed up and it such have a bigger effect. Anyhow, if it is fixed when using the Tracker\Db then it is fine. It also had an effect when archiving and using normal Db see issue description but don't remember how much it was.

diosmosis commented 9 years ago

I used bulk tracking to see if it would have a bigger effect. I used w/ xdebug disabled + development mode enabled. I don't know what you mean by caches warmed up?

FYI, I used blackfire, so I think I can send you a link.

It makes sense that archiving would have an issue, it doesn't use the Tracker\Db class.

diosmosis commented 9 years ago

W/ one tracking request, it still is only called by Option::autoload & only called 20 times. It represents a bigger portion of the time, but still only 1.3%

diosmosis commented 9 years ago

On further examination, Option::autoload is only called by the tracker when running scheduled tasks. So nothing to do tracker side.

I'm looking at core:archive profiles.

diosmosis commented 9 years ago

During a core:archive command, logSql takes up .07% of the total time. My related config is:

[Development]
enabled = 1
disable_merged_assets = 1

[log]
log_writers[] = file
log_level = INFO

Can you confirm the results you last found, and perhaps share the profile?

tsteur commented 9 years ago

You should really disable development mode when profiling anything as it gives you wrong results. It will spend more time in things that are usually cached etc. With warming up the cache I mean make sure to perform one or two requests before actually profiling to make sure everything is cached.

I don't use blackfire but used it to quickly upload one run from xhprof https://blackfire.io/profiles/b17f20df-2ec4-4c4f-b930-636768eefc6e/graph

trackerprofile

When you test the archiver, do you have many sites, with like 1mio actual urls, many different actions, also lots of other data etc? Which period did you profile? Sometimes archiving of one period for one site takes 50 minutes. Even 1% can be actually quite a bit then although it is surely not much.

diosmosis commented 9 years ago

You should really disable development mode when profiling anything as it gives you wrong results.

Did not know that, I assumed from your comment above I should run it with development mode enabled.

I don't use blackfire but used it to quickly upload one run from xhprof https://blackfire.io/profiles/b17f20df-2ec4-4c4f-b930-636768eefc6e/graph

From your profile, I see that logSql is called when autoloading Options, and only when remembering to invalidate reports. I also see that most of the time comes not from logSql, but Log::getInstance(). So going through DI to setup the logger is the problem here, at least for the tracker. I'll look into why the normal Db is being used when loading options.

When you test the archiver, do you have many sites, with like 1mio actual urls, many different actions, also lots of other data etc? Which period did you profile?

I had 20 sites, and was using demo2db's data, but only doing archiving for one month of it, so it wasn't a lot. I think doing the profile you did would be rather complicated to do again, do you still have the profile? In the tracker the issue is setting up the logger, but this should already be done for archiving, so I'm not sure why an if (logLevel > levelToLog) would be even a slightly expensive operation in this case. The profile would help in figuring this out.

In the future, it would be very useful for these profiles to be uploaded and linked in issues like these.

tsteur commented 9 years ago

Sorry I meant development mode disabled and caches warmed up.

I don't have the other profile from archiving anymore but I should still have the profile from this morning of the tracker if you meant this? Can send it on slack.

I can upload profiles in the future, in this case I thought it wouldn't be a big deal as it shouldn't be a problem adding the requested option. In most cases the logged sql statements are not needed at all and when needed it could be simply enabled I thought. It would help of course when a wrong DB is used. Didn't think it would use the wrong one.

diosmosis commented 9 years ago

I don't have the other profile from archiving anymore but I should still have the profile from this morning of the tracker if you meant this? Can send it on slack.

The tracker is easy to get, the archiving one would be good to see. I'll try and replicate.

I can upload profiles in the future, in this case I thought it wouldn't be a big deal as it shouldn't be a problem adding the requested option. In most cases the logged sql statements are not needed at all and when needed it could be simply enabled I thought.

I'm not sure I see the reason for adding the option, since those logs should only be logged via DEBUG level. Adding the option might help us when debugging (to clear a lot of the crap), but putting it in right now might hide the real problem we haven't found yet (at least for archiving).

I'll try to run a profile on EC2 for archiving. Did you use a specific data dump (maybe the data on testing-big?)?

tsteur commented 9 years ago

yeah testing big. Warning it can take some hours to archive :)

If adding this option is a problem I suggest to just move this issue to mid term or so. In my mind it was ok to just enable that option if needed but if this is not the case, the time spent on this issue is probably not worth it and we can improve somewhere else even more. I mean re the archiver.

The tracker should be fixed though.

diosmosis commented 9 years ago

Adding the INI option is fine, but I want to find the underlying performance issue first, just so we don't accidentally cover it up.

mnapoli commented 9 years ago

By the way remember also to optimize the autoloader (composer dump-autoload -o) it makes a very big difference in performances too.

diosmosis commented 9 years ago

Added some optimizations in #7711, feel free to review/merge.