mailwatch / MailWatch

MailWatch for MailScanner is a web-based front-end to MailScanner
http://mailwatch.org/
GNU General Public License v2.0
117 stars 66 forks source link

MailWatch very slow during query #646

Closed stefaweb closed 4 years ago

stefaweb commented 7 years ago

When we connect as a admin-domain or user, MailWatch is very very slow when tens of thousand of records in maillog. I've to wait 4-10 seconds to have page refresh (quarantine, reports). Currently, I have 150 000 records in maillog.

Do we have a way to have a faster database query?

(the server is a 2 sockets with 24 cores and plenty of RAM).

Skywalker-11 commented 7 years ago

Is it on every page or just quarantine and reports?

stefaweb commented 7 years ago

Only this two.

When I click on the Quarantine and Reports menu, I need to wait around 5 secondes or more to go to the next page inside the menu.

capture d ecran 2017-03-18 a 14 48 17

Inside reports.php, its also slow when we click on the reports. More than 10 seconds to build a graph.

Skywalker-11 commented 7 years ago

The report page uses all mail data and generates the graph at the server what probably is the reason why its so slow. I did a rebase of pr #477 so can you try to use that and check if the performance is better/worse or even working?

stefaweb commented 7 years ago

Sorry, what is the git command to "merge" an unpublished PR on my local git directory?

Skywalker-11 commented 7 years ago
git remote add upstream https://github.com/mailwatch/1.2.0.git
git checkout upstream pull/477/head:pr477

This adds the mailwatch repo as remote and checks out PR 477 to local branch pr477. After that you can merge it like a normal branch

stefaweb commented 7 years ago

Problem!

# git checkout upstream pull/477/head:pr477
error: pathspec 'upstream' did not match any file(s) known to git.
error: pathspec 'pull/477/head:pr477' did not match any file(s) known to git.
# git remote -v
origin  https://github.com/mailwatch/1.2.0.git (fetch)
origin  https://github.com/mailwatch/1.2.0.git (push)
upstream    https://github.com/mailwatch/1.2.0.git (fetch)
upstream    https://github.com/mailwatch/1.2.0.git (push)
asuweb commented 7 years ago

@stefaweb - have you optimised mysql/mariadb to make use of the available ram?

stefaweb commented 7 years ago

I'll will check with tuning-primer.sh later if problem from this side.

Skywalker-11 commented 7 years ago

sorry should be git fetch upstream pull/477/head:pr477

stefaweb commented 7 years ago

Works. Thanks.

I checked mariadDB with tuning-primer.sh. No problem of setup or memory.

stefaweb commented 7 years ago

sorry should be git fetch upstream pull/477/head:pr477

remote: Counting objects: 22, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 22 (delta 11), reused 6 (delta 6), pack-reused 2
Dépaquetage des objets: 100% (22/22), fait.
Depuis https://github.com/mailwatch/1.2.0
 * [nouvelle référence] refs/pull/477/head -> pr477

I can't find the new code in the directory?

Skywalker-11 commented 7 years ago

Did you do a checkout of the branch after fetch?

stefaweb commented 7 years ago

Yes, better with. Thanks.

stefaweb commented 7 years ago

No difference with new graph code. Its still very slow when I click on Quarantine and Reports menu.

endelwar commented 7 years ago

This should be better now with performance feature merged

endelwar commented 7 years ago

Still we can't do more on db query time.

stefaweb commented 7 years ago

I'm waiting 1.2.1 final to upgrade my prod server and see. On the dev server, I've not enough records.

Skywalker-11 commented 7 years ago

Why not dump the prod db and import to the dev server? You could check if the upgrade works 😄

stefaweb commented 7 years ago

The dev is not compltely a dev. Its more a test server with 2 internal "real" domains.

stefaweb commented 7 years ago

Hello!

I updated my prod server to 1.2.3-dev. The query for Quarantine and reports is still very very slow. Not usable.

For rep_total_mail_by_date.php, it take 56 secondes to print the graph.

I tried to launch rep_top_senders_by_quantity.php and stoped mysql after 10 minutes without result printed on GUI.

I've 394 145 records in maillog table.

Slow query log:

# Time: 170416  9:32:02
# User@Host: mailwatch[mailwatch] @ localhost []
# Thread_id: 114  Schema: mailscanner  QC_hit: No
# Query_time: 56.451353  Lock_time: 0.000170  Rows_sent: 61  Rows_examined: 394145
# Full_scan: Yes  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: No
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: No
SET timestamp=1492327922;
SELECT
  DATE_FORMAT(date, '%d/%m/%y') AS xaxis,
  COUNT(*) AS total_mail,
  SUM(CASE WHEN virusinfected>0 THEN 1 ELSE 0 END) AS total_virus,

  SUM(CASE WHEN (
    isspam>0
    AND (virusinfected=0 OR virusinfected IS NULL)
    AND (nameinfected=0 OR nameinfected IS NULL)
    AND (otherinfected=0 OR otherinfected IS NULL)
    ) THEN 1 ELSE 0 END
  ) AS total_spam,

  SUM(CASE WHEN (
    isspam>0
    AND (virusinfected=0 OR virusinfected IS NULL)
    AND (nameinfected=0 OR nameinfected IS NULL)
    AND (otherinfected=0 OR otherinfected IS NULL)
    AND (ishighspam=0 OR ishighspam IS NULL)
    ) THEN 1 ELSE 0 END
  ) AS total_lowspam,

  SUM(CASE WHEN (
    ishighspam>0
    AND (virusinfected=0 OR virusinfected IS NULL)
    AND (nameinfected=0 OR nameinfected IS NULL)
    AND (otherinfected=0 OR otherinfected IS NULL)
    ) THEN 1 ELSE 0 END
  ) AS total_highspam,

  SUM(CASE WHEN (
    ismcp>0
    AND (virusinfected=0 OR virusinfected IS NULL)
    AND (nameinfected=0 OR nameinfected IS NULL)
    AND (otherinfected=0 OR otherinfected IS NULL)
    AND (isspam=0 OR isspam IS NULL)
    AND (ishighspam=0 OR ishighspam IS NULL)
    ) THEN 1 ELSE 0 END
  ) AS total_mcp,

  SUM(CASE WHEN (
    nameinfected>0
    AND (virusinfected=0 OR virusinfected IS NULL)
    AND (otherinfected=0 OR otherinfected IS NULL)
    ) THEN 1 ELSE 0 END
  ) AS total_blocked,

  SUM(CASE WHEN (
    (virusinfected=0 OR virusinfected IS NULL)
    AND (nameinfected=0 OR nameinfected IS NULL)
    AND (otherinfected=0 OR otherinfected IS NULL)
    AND (isspam=0 OR isspam IS NULL)
    AND (ishighspam=0 OR ishighspam IS NULL)
    AND (ismcp=0 OR ismcp IS NULL)
    AND (ishighmcp=0 OR ishighmcp IS NULL)
    ) THEN 1 ELSE 0 END
  ) as total_clean,

  SUM(size) AS total_size
 FROM
  maillog
 WHERE
  1=1
AND (1=1)

 GROUP BY
  xaxis
 ORDER BY
  date;
asuweb commented 7 years ago

@stefaweb - my message count is 614,079 and it takes less than 2 seconds to produce these 2 reports on my system.

stefaweb commented 7 years ago

What is your config?

For me : Dell PowerEdge with SAS 15K disk Raid5, KVM under Proxmox, 6Go of RAM, 12 proc (2 sockets * 6 cores), Jessie, MariaDB 10, PHP 5.6.

I got 35 secondes for rep_total_mail_by_date.php just after a server restart.

I've enclosed the mysql variables. If you can have a look? I've not found something special with the config.

mysql_variables.zip

stefaweb commented 7 years ago

Hello! More test on this.

I increased innodb_buffer_pool_size to 2048M on the server. I now have a query duration of 24 seconds (using phpmyadmin to run the query).

I tried the same mailwatch database on MariadB 10.0.30 but on a wheezy OpenVZ CT (on the same hardware server). I used phpmyadmin to run the query. Same time (around 24 seconds).

But this is special. On OpenVZ server, if I run the query a second time, I got 0.0005 sec. On KVM, if I run the query a second time, I got around 2 seconds.

shawniverson commented 7 years ago

You don't have any sort of debug or tracing enabled by chance, do you?

On Mon, Apr 17, 2017 at 4:30 AM, Stéphane notifications@github.com wrote:

Hello! More test on this.

I increased innodb_buffer_pool_size to 2048M on the server. I now have a query duration of 24 seconds (using phpmyadmin to run the query).

I tried the same mailwatch database on MariadB 10.0.30 but on a wheezy OpenVZ CT (on the same hardware server). I used phpmyadmin to run the query. Same time (around 24 seconds).

But this is special. On OpenVZ server, when in phpmyadmin, if I run the query a second time, I got 0.0005 sec. On KVM, if I run the query a second time, I got around 2 seconds.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mailwatch/MailWatch/issues/646#issuecomment-294431417, or mute the thread https://github.com/notifications/unsubscribe-auth/AF9yS6sDvToZyd7v7AR6ITdZzycgh_pnks5rwyMcgaJpZM4MhV2P .

stefaweb commented 7 years ago

You don't have any sort of debug or tracing enabled by chance, do you?

Only slow query log to, to track slow query.

asuweb commented 7 years ago

What is your config?

On my proxmox VM (KVM) 4GB ram, CEPH (SSD) storage, 2.6 seconds. (Mariadb 10.1, centos 6)

On my other VM (OVH public cloud) 8GB ram, 4.43s (MySQL 5.1, Centos 6)

Same dataset, loaded via the mailwatch gui (rep_total_mail_by_date.php)

Skywalker-11 commented 7 years ago

@stefaweb Can you try disabling the slow query log and try again?

stefaweb commented 7 years ago

Could you try to do a query just after the restart of mariadb?

Its seems that if we run more than one time a query, the time change. The more query is done, the faster it is.

weird

Skywalker-11 commented 7 years ago

Mysql is caching the result. If you use SELECT SQL_NO_CACHE DATE_FORMAT(date, '%d/%m/%y') AS xaxis, ....

you can disable the caching

stefaweb commented 7 years ago

Ok, I tried without slow_query_log. Did a mariadb restart. No change. 23 sec.

stefaweb commented 7 years ago

CEPH (SSD) storage

Cheater! 😀

Just to be sure.

# time dd if=/dev/zero of=/tmp/test oflag=direct bs=64k count=16000
16000+0 enregistrements lus
16000+0 enregistrements écrits
1048576000 octets (1,0 GB) copiés, 1,64382 s, 638 MB/s
stefaweb commented 7 years ago

I tried with another server.

I real hardware server with Jessie + MariadDB 10.0.30 + php 5.6 fresh out of the box.

Modèle: DELL® PowerEdge R320
Processeur: 1 x Xeon® E5 1410 v2
Architecture: 4 cœurs 8 threads @2.8 Ghz cache L3 10MB, x64, VT
RAM: 16 Go DDR3 ECC
Disque dur: 4 x 3 To SAS
RAID: Matériel, Dell PERC H710, (Raid 0/1/5)

I loaded the database with 395 957 records in maillog and executed the query.

32.7492 sec

Je donne ma langue au chat !

stefaweb commented 7 years ago

I'm back on this. Following the conversation in #770.

The speed problem is may be that you use Postfix (mtalog/mtalog_ids) and me Exim (maillog).

Skywalker-11 commented 7 years ago

I think it heavily depends on the number of mails you have in your db and what type they are (virus, spam, ham, etc.).

I tried to generate around 1.2 million test mails once and it took 20-30 seconds to finish. When using real mail database (which contained more spam) with around the same number of mails it took >3 min.

ghost commented 7 years ago

We encounter the same 'problem' it looks like an index is missing. We are regularly do an search on a specific to address and it takes quite a while to finish.. we have two servers with each around 900.000 real mail of which 20% is spam. The Debian installed version of mailwatch (old) was quite a lot faster.

asuweb commented 7 years ago

@oxymabv - what version of mailwatch are you currently running?

Searches are pretty quick for me at the moment, I've just done a search for "from domain" and it returned 95,000 results in less than 1sec.

ghost commented 7 years ago

Hi I have just optimized some mysql settings en implemented the cleanup scripts... it is now down to 350.000 mail per node and the search is significantly faster.

Skywalker-11 commented 7 years ago

@oxymabv can you share the changed mysql settings?

ghost commented 7 years ago

I have changed as suggested by mysqltuner: query_cache_limit = 4M innodb_buffer_pool_size=3G

optimize table won't do any tricks because our engine (innodb) doesn't support it.. Also the mailwatch cronjobs were not enabled...

stefaweb commented 5 years ago

Hi!

It's still very low with big database (I've 150 000 messages in the database). It took minutes to got result when I click on quarantine or report menu when I'm logged using something else than admin.

Is it possible to do something to improve speed?

Cheers,

asuweb commented 5 years ago

@stefaweb - I've over 3.6 million rows in my maillog table, and performance is pretty quick. Have you tuned your DB?

Admittedly the DB does run on a separate instance.

stefaweb commented 5 years ago

H @asuweb!

The server is not loaded and have all needed resource (KVM on Proxmox + 8Mo + 8 cores). I'm using Debian stretch with Apache 2.4, mariadb 10.1.38 and default php7.0. Server disks are SAS 10K.

When I'm connected in admin no problem of speed at all. Access to reports or quarantine menu is immediate. If I'm connected as an admin-domain (which select a part of the maillog table), all is slow. When I click on reports or quarantine menu, I've to wait 5 to 10 seconds to have the page refreshed. When I select something in reports, its worst.

I have been dragging this problem for months and I can not find where the worry comes from.

Skywalker-11 commented 5 years ago

I think the problem is with the current datastructure and the way the filtering works: This https://github.com/mailwatch/MailWatch/blob/b07d6181eab4b3d06bfc8130f9441ddecf40e0a9/mailscanner/functions.php#L2907-L2922 is applied for each sql query that is executed for the domain admin. The to_address and from_address are not indexed and it is using a regex expression (wildcard) to filter the correct mails the admin should have access to which in itself is very efficient. An Mysql server also can hardly optimize such a query due to the nature of a regex search. When fixing #1084 this also introduced even more overhead because the previous search only listed mails where a user was listed as the first recipient. The patch now searches all recpients that match. If you have many mails with lots of recipients this gets even worse.

shawniverson commented 5 years ago

I agree, changes to the database structure are needed to resolve this, needs to be a design consideration for version 2x. Probably not a good idea to try to fix in the current version.

stefaweb commented 5 years ago

many mails with lots of recipients this gets even worse

Hi!

Right @asuweb. I manage around 40 domains in the server and in input thousands of domains.

@shawniverson: Is it still possible to find an intermediate solution to improve speed without waiting for version 2? Because currently, this situation makes MailWatch unusable in this case. Some of my user have to wait two or three minutes if they try to use reports features.

shawniverson commented 5 years ago

@stefaweb I was thinking about this stopgap...

For domain admins and users, it may make sense, when no filter is set, to automatically set a default filter that limits the number of days in the past to query the database in order to speed things up. Let's say something like 3 days, just as an example. Then if a user or domain admin wants to override the default filter, they can. I would think that most folks are interested in the most recent email transactions most of the time.

This would at least work for the Reports section.

Thoughts @asuweb and @Skywalker-11 ?

stefaweb commented 5 years ago

If this can help.

With mysqltuner, I got:

Reduce your SELECT DISTINCT queries which have no LIMIT clause

With a out of the box MariaDB, the first time I click on "report" menu after a mysql restart, I got this:

Mar 17 09:40:41 antispam mysqld[13439]: 2019-03-17  9:40:41 140339624807168 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 0 failed attempts to flush a page!
Mar 17 09:40:41 antispam mysqld[13439]: 2019-03-17  9:40:41 140339624807168 [Note] InnoDB: Consider increasing the buffer pool size.
Mar 17 09:40:41 antispam mysqld[13439]: 2019-03-17  9:40:41 140339624807168 [Note] InnoDB: Pending flushes (fsync) log: 0 buffer pool: 0 OS file reads: 16420 OS file writes: 1720 OS fsyncs: 173

With a greater value for innodb_buffer_pool_size (3G for me), I don't have the error message.

Skywalker-11 commented 5 years ago

From https://mariadb.com/kb/en/library/xtradbinnodb-buffer-pool/

The most important server system variable is innodb_buffer_pool_size, which you can set from 70-80% of the total available memory on a dedicated database server with only or primarily XtraDB/InnoDB tables.

stefaweb commented 5 years ago

Thanks. Currently, I've these values. I tried more on some values, clicking on report menu, no change. But, if you see more to add, welcome. ;)

[mysqld]
query_cache_limit = 0
query_cache_size = 0
tmp_table_size = 128M
max_heap_table_size = 64M
performance_schema = ON
innodb_buffer_pool_size = 4G
innodb_log_file_size = 512M
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.