mdsr-book / mdsr

Complement to CRC Press book *Modern Data Science with R*
39 stars 25 forks source link

mdsr database performance #74

Closed homerhanumat closed 5 months ago

homerhanumat commented 5 months ago

Lately the host mdsr.cdc7tgkkqd0n.us-east-1.rds.amazonaws.com has been very slow. For example, the query shown in the terminal session below has been running for ten minutes with no results:

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 100579
Server version: 10.11.6-MariaDB-log managed by https://aws.amazon.com/rds/

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use airlines;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [airlines]> select a.name as Destination,
    ->        sum(1) as Number_of_Flights
    -> from flights as f
    -> join airports as a
    -> on f.dest = a.faa
    -> where year = 2015 and origin = 'ORD'
    -> group by f.dest
    -> order by Number_of_Flights desc
    -> limit 0, 5;
beanumber commented 5 months ago

Hmm...thanks for reporting this. I had to switch the RDS instance from MySQL to MariaDB a few weeks back. MariaDB is purportedly faster, but perhaps there are some optimizations that I haven't implemented.

I'll investigate further. Do you have any intuition as to which queries are slower than you expect them to be?

beanumber commented 5 months ago
  The free storage capacity for DB Instance: mdsr is low at 3% of the provisioned storage [Provisioned Storage: 19.27 GB, Free Storage: 546.88 MB]. You may want to increase the provisioned storage to address this issue.

 

  The gp2 burst balance credits for the RDS database instance are low. To resolve this issue, reduce IOPS usage or modify your storage settings to enable higher performance.

 

  DB Instance mdsr contains MyISAM tables that have not been migrated to InnoDB. These tables can impact your ability to perform point-in-time restores. Consider converting these tables to InnoDB. Please refer to http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.BackingUpAndRestoringAmazonRDSInstances.html#Overview.BackupDeviceRestrictions

 

beanumber commented 5 months ago

There were two rogue queries that I killed. The query above ran in 6m25s. While this is far from ideal, it did work. I'll see if I can change some of the settings.

homerhanumat commented 5 months ago

Everything seems slow. Tor example, this one with no grouping too more than nine seconds:

MariaDB [airlines]> SELECT sum(1)
    -> FROM flights f
    -> JOIN airports a ON f.origin = a.faa
    -> WHERE f.dest = 'DFW' AND f.year = 2015 AND f.month = 5 
    ->   AND f.day = 14;
+--------+
| sum(1) |
+--------+
|    736 |
+--------+
1 row in set (9.890 sec)
homerhanumat commented 5 months ago

Sorry, the previous comment is probably outdated now that you have killed rogue queries.

beanumber commented 5 months ago

I rebuilt the tables using MyISAM instead of InnoDB and I'm seeing much better performance now.

MariaDB [airlines]> select a.name as Destination, sum(1) as Number_of_Flights from flights as f join airports as a on f.dest = a.faa where year = 2015 and origin = 'ORD' group by f.dest order by Number_of_Flights desc limit 0, 5;
+------------------------------------+-------------------+
| Destination                        | Number_of_Flights |
+------------------------------------+-------------------+
| La Guardia                         |             10492 |
| Los Angeles Intl                   |              8720 |
| Dallas Fort Worth Intl             |              8384 |
| San Francisco Intl                 |              8156 |
| General Edward Lawrence Logan Intl |              7240 |
+------------------------------------+-------------------+
5 rows in set (4.104 sec)

MariaDB [airlines]> SELECT sum(1) FROM flights f JOIN airports a ON f.origin = a.faa WHERE f.dest = 'DFW' AND f.year = 2015 AND f.month = 5 AND f.day = 14;
+--------+
| sum(1) |
+--------+
|    736 |
+--------+
1 row in set (0.215 sec)

Can you confirm that all looks good on your end @homerhanumat ?

homerhanumat commented 5 months ago

Yes, it looks better for me, too. Thanks!