Closed stefaweb closed 4 years ago
Is it on every page or just quarantine and reports?
Only this two.
When I click on the Quarantine and Reports menu, I need to wait around 5 secondes or more to go to the next page inside the menu.
Inside reports.php, its also slow when we click on the reports. More than 10 seconds to build a graph.
The report page uses all mail data and generates the graph at the server what probably is the reason why its so slow. I did a rebase of pr #477 so can you try to use that and check if the performance is better/worse or even working?
Sorry, what is the git command to "merge" an unpublished PR on my local git directory?
git remote add upstream https://github.com/mailwatch/1.2.0.git
git checkout upstream pull/477/head:pr477
This adds the mailwatch repo as remote and checks out PR 477 to local branch pr477. After that you can merge it like a normal branch
Problem!
# git checkout upstream pull/477/head:pr477
error: pathspec 'upstream' did not match any file(s) known to git.
error: pathspec 'pull/477/head:pr477' did not match any file(s) known to git.
# git remote -v
origin https://github.com/mailwatch/1.2.0.git (fetch)
origin https://github.com/mailwatch/1.2.0.git (push)
upstream https://github.com/mailwatch/1.2.0.git (fetch)
upstream https://github.com/mailwatch/1.2.0.git (push)
@stefaweb - have you optimised mysql/mariadb to make use of the available ram?
I'll will check with tuning-primer.sh later if problem from this side.
sorry should be git fetch upstream pull/477/head:pr477
Works. Thanks.
I checked mariadDB with tuning-primer.sh. No problem of setup or memory.
sorry should be git fetch upstream pull/477/head:pr477
remote: Counting objects: 22, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 22 (delta 11), reused 6 (delta 6), pack-reused 2
Dépaquetage des objets: 100% (22/22), fait.
Depuis https://github.com/mailwatch/1.2.0
* [nouvelle référence] refs/pull/477/head -> pr477
I can't find the new code in the directory?
Did you do a checkout of the branch after fetch?
Yes, better with. Thanks.
No difference with new graph code. Its still very slow when I click on Quarantine and Reports menu.
This should be better now with performance feature merged
Still we can't do more on db query time.
I'm waiting 1.2.1 final to upgrade my prod server and see. On the dev server, I've not enough records.
Why not dump the prod db and import to the dev server? You could check if the upgrade works 😄
The dev is not compltely a dev. Its more a test server with 2 internal "real" domains.
Hello!
I updated my prod server to 1.2.3-dev. The query for Quarantine and reports is still very very slow. Not usable.
For rep_total_mail_by_date.php, it take 56 secondes to print the graph.
I tried to launch rep_top_senders_by_quantity.php and stoped mysql after 10 minutes without result printed on GUI.
I've 394 145 records in maillog table.
Slow query log:
# Time: 170416 9:32:02
# User@Host: mailwatch[mailwatch] @ localhost []
# Thread_id: 114 Schema: mailscanner QC_hit: No
# Query_time: 56.451353 Lock_time: 0.000170 Rows_sent: 61 Rows_examined: 394145
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: No
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
SET timestamp=1492327922;
SELECT
DATE_FORMAT(date, '%d/%m/%y') AS xaxis,
COUNT(*) AS total_mail,
SUM(CASE WHEN virusinfected>0 THEN 1 ELSE 0 END) AS total_virus,
SUM(CASE WHEN (
isspam>0
AND (virusinfected=0 OR virusinfected IS NULL)
AND (nameinfected=0 OR nameinfected IS NULL)
AND (otherinfected=0 OR otherinfected IS NULL)
) THEN 1 ELSE 0 END
) AS total_spam,
SUM(CASE WHEN (
isspam>0
AND (virusinfected=0 OR virusinfected IS NULL)
AND (nameinfected=0 OR nameinfected IS NULL)
AND (otherinfected=0 OR otherinfected IS NULL)
AND (ishighspam=0 OR ishighspam IS NULL)
) THEN 1 ELSE 0 END
) AS total_lowspam,
SUM(CASE WHEN (
ishighspam>0
AND (virusinfected=0 OR virusinfected IS NULL)
AND (nameinfected=0 OR nameinfected IS NULL)
AND (otherinfected=0 OR otherinfected IS NULL)
) THEN 1 ELSE 0 END
) AS total_highspam,
SUM(CASE WHEN (
ismcp>0
AND (virusinfected=0 OR virusinfected IS NULL)
AND (nameinfected=0 OR nameinfected IS NULL)
AND (otherinfected=0 OR otherinfected IS NULL)
AND (isspam=0 OR isspam IS NULL)
AND (ishighspam=0 OR ishighspam IS NULL)
) THEN 1 ELSE 0 END
) AS total_mcp,
SUM(CASE WHEN (
nameinfected>0
AND (virusinfected=0 OR virusinfected IS NULL)
AND (otherinfected=0 OR otherinfected IS NULL)
) THEN 1 ELSE 0 END
) AS total_blocked,
SUM(CASE WHEN (
(virusinfected=0 OR virusinfected IS NULL)
AND (nameinfected=0 OR nameinfected IS NULL)
AND (otherinfected=0 OR otherinfected IS NULL)
AND (isspam=0 OR isspam IS NULL)
AND (ishighspam=0 OR ishighspam IS NULL)
AND (ismcp=0 OR ismcp IS NULL)
AND (ishighmcp=0 OR ishighmcp IS NULL)
) THEN 1 ELSE 0 END
) as total_clean,
SUM(size) AS total_size
FROM
maillog
WHERE
1=1
AND (1=1)
GROUP BY
xaxis
ORDER BY
date;
@stefaweb - my message count is 614,079 and it takes less than 2 seconds to produce these 2 reports on my system.
What is your config?
For me : Dell PowerEdge with SAS 15K disk Raid5, KVM under Proxmox, 6Go of RAM, 12 proc (2 sockets * 6 cores), Jessie, MariaDB 10, PHP 5.6.
I got 35 secondes for rep_total_mail_by_date.php just after a server restart.
I've enclosed the mysql variables. If you can have a look? I've not found something special with the config.
Hello! More test on this.
I increased innodb_buffer_pool_size to 2048M on the server. I now have a query duration of 24 seconds (using phpmyadmin to run the query).
I tried the same mailwatch database on MariadB 10.0.30 but on a wheezy OpenVZ CT (on the same hardware server). I used phpmyadmin to run the query. Same time (around 24 seconds).
But this is special. On OpenVZ server, if I run the query a second time, I got 0.0005 sec. On KVM, if I run the query a second time, I got around 2 seconds.
You don't have any sort of debug or tracing enabled by chance, do you?
On Mon, Apr 17, 2017 at 4:30 AM, Stéphane notifications@github.com wrote:
Hello! More test on this.
I increased innodb_buffer_pool_size to 2048M on the server. I now have a query duration of 24 seconds (using phpmyadmin to run the query).
I tried the same mailwatch database on MariadB 10.0.30 but on a wheezy OpenVZ CT (on the same hardware server). I used phpmyadmin to run the query. Same time (around 24 seconds).
But this is special. On OpenVZ server, when in phpmyadmin, if I run the query a second time, I got 0.0005 sec. On KVM, if I run the query a second time, I got around 2 seconds.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mailwatch/MailWatch/issues/646#issuecomment-294431417, or mute the thread https://github.com/notifications/unsubscribe-auth/AF9yS6sDvToZyd7v7AR6ITdZzycgh_pnks5rwyMcgaJpZM4MhV2P .
You don't have any sort of debug or tracing enabled by chance, do you?
Only slow query log to, to track slow query.
What is your config?
On my proxmox VM (KVM) 4GB ram, CEPH (SSD) storage, 2.6 seconds. (Mariadb 10.1, centos 6)
On my other VM (OVH public cloud) 8GB ram, 4.43s (MySQL 5.1, Centos 6)
Same dataset, loaded via the mailwatch gui (rep_total_mail_by_date.php)
@stefaweb Can you try disabling the slow query log and try again?
Could you try to do a query just after the restart of mariadb?
Its seems that if we run more than one time a query, the time change. The more query is done, the faster it is.
weird
Mysql is caching the result. If you use
SELECT SQL_NO_CACHE DATE_FORMAT(date, '%d/%m/%y') AS xaxis, ....
you can disable the caching
Ok, I tried without slow_query_log. Did a mariadb restart. No change. 23 sec.
CEPH (SSD) storage
Cheater! 😀
Just to be sure.
# time dd if=/dev/zero of=/tmp/test oflag=direct bs=64k count=16000
16000+0 enregistrements lus
16000+0 enregistrements écrits
1048576000 octets (1,0 GB) copiés, 1,64382 s, 638 MB/s
I tried with another server.
I real hardware server with Jessie + MariadDB 10.0.30 + php 5.6 fresh out of the box.
Modèle: DELL® PowerEdge R320
Processeur: 1 x Xeon® E5 1410 v2
Architecture: 4 cœurs 8 threads @2.8 Ghz cache L3 10MB, x64, VT
RAM: 16 Go DDR3 ECC
Disque dur: 4 x 3 To SAS
RAID: Matériel, Dell PERC H710, (Raid 0/1/5)
I loaded the database with 395 957 records in maillog and executed the query.
32.7492 sec
Je donne ma langue au chat !
I'm back on this. Following the conversation in #770.
The speed problem is may be that you use Postfix (mtalog/mtalog_ids) and me Exim (maillog).
I think it heavily depends on the number of mails you have in your db and what type they are (virus, spam, ham, etc.).
I tried to generate around 1.2 million test mails once and it took 20-30 seconds to finish. When using real mail database (which contained more spam) with around the same number of mails it took >3 min.
We encounter the same 'problem' it looks like an index is missing. We are regularly do an search on a specific to address and it takes quite a while to finish.. we have two servers with each around 900.000 real mail of which 20% is spam. The Debian installed version of mailwatch (old) was quite a lot faster.
@oxymabv - what version of mailwatch are you currently running?
Searches are pretty quick for me at the moment, I've just done a search for "from domain" and it returned 95,000 results in less than 1sec.
Hi I have just optimized some mysql settings en implemented the cleanup scripts... it is now down to 350.000 mail per node and the search is significantly faster.
@oxymabv can you share the changed mysql settings?
I have changed as suggested by mysqltuner: query_cache_limit = 4M innodb_buffer_pool_size=3G
optimize table won't do any tricks because our engine (innodb) doesn't support it.. Also the mailwatch cronjobs were not enabled...
Hi!
It's still very low with big database (I've 150 000 messages in the database). It took minutes to got result when I click on quarantine or report menu when I'm logged using something else than admin.
Is it possible to do something to improve speed?
Cheers,
@stefaweb - I've over 3.6 million rows in my maillog table, and performance is pretty quick. Have you tuned your DB?
Admittedly the DB does run on a separate instance.
H @asuweb!
The server is not loaded and have all needed resource (KVM on Proxmox + 8Mo + 8 cores). I'm using Debian stretch with Apache 2.4, mariadb 10.1.38 and default php7.0. Server disks are SAS 10K.
When I'm connected in admin no problem of speed at all. Access to reports or quarantine menu is immediate. If I'm connected as an admin-domain (which select a part of the maillog table), all is slow. When I click on reports or quarantine menu, I've to wait 5 to 10 seconds to have the page refreshed. When I select something in reports, its worst.
I have been dragging this problem for months and I can not find where the worry comes from.
I think the problem is with the current datastructure and the way the filtering works:
This https://github.com/mailwatch/MailWatch/blob/b07d6181eab4b3d06bfc8130f9441ddecf40e0a9/mailscanner/functions.php#L2907-L2922 is applied for each sql query that is executed for the domain admin. The to_address
and from_address
are not indexed and it is using a regex expression (wildcard) to filter the correct mails the admin should have access to which in itself is very efficient. An Mysql server also can hardly optimize such a query due to the nature of a regex search.
When fixing #1084 this also introduced even more overhead because the previous search only listed mails where a user was listed as the first recipient. The patch now searches all recpients that match. If you have many mails with lots of recipients this gets even worse.
I agree, changes to the database structure are needed to resolve this, needs to be a design consideration for version 2x. Probably not a good idea to try to fix in the current version.
many mails with lots of recipients this gets even worse
Hi!
Right @asuweb. I manage around 40 domains in the server and in input thousands of domains.
@shawniverson: Is it still possible to find an intermediate solution to improve speed without waiting for version 2? Because currently, this situation makes MailWatch unusable in this case. Some of my user have to wait two or three minutes if they try to use reports features.
@stefaweb I was thinking about this stopgap...
For domain admins and users, it may make sense, when no filter is set, to automatically set a default filter that limits the number of days in the past to query the database in order to speed things up. Let's say something like 3 days, just as an example. Then if a user or domain admin wants to override the default filter, they can. I would think that most folks are interested in the most recent email transactions most of the time.
This would at least work for the Reports section.
Thoughts @asuweb and @Skywalker-11 ?
If this can help.
With mysqltuner, I got:
Reduce your SELECT DISTINCT queries which have no LIMIT clause
With a out of the box MariaDB, the first time I click on "report" menu after a mysql restart, I got this:
Mar 17 09:40:41 antispam mysqld[13439]: 2019-03-17 9:40:41 140339624807168 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 0 failed attempts to flush a page!
Mar 17 09:40:41 antispam mysqld[13439]: 2019-03-17 9:40:41 140339624807168 [Note] InnoDB: Consider increasing the buffer pool size.
Mar 17 09:40:41 antispam mysqld[13439]: 2019-03-17 9:40:41 140339624807168 [Note] InnoDB: Pending flushes (fsync) log: 0 buffer pool: 0 OS file reads: 16420 OS file writes: 1720 OS fsyncs: 173
With a greater value for innodb_buffer_pool_size (3G for me), I don't have the error message.
From https://mariadb.com/kb/en/library/xtradbinnodb-buffer-pool/
The most important server system variable is innodb_buffer_pool_size, which you can set from 70-80% of the total available memory on a dedicated database server with only or primarily XtraDB/InnoDB tables.
Thanks. Currently, I've these values. I tried more on some values, clicking on report menu, no change. But, if you see more to add, welcome. ;)
[mysqld]
query_cache_limit = 0
query_cache_size = 0
tmp_table_size = 128M
max_heap_table_size = 64M
performance_schema = ON
innodb_buffer_pool_size = 4G
innodb_log_file_size = 512M
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
When we connect as a admin-domain or user, MailWatch is very very slow when tens of thousand of records in maillog. I've to wait 4-10 seconds to have page refresh (quarantine, reports). Currently, I have 150 000 records in maillog.
Do we have a way to have a faster database query?
(the server is a 2 sockets with 24 cores and plenty of RAM).