mdsr-book / mdsr

Complement to CRC Press book *Modern Data Science with R*
39 stars 25 forks source link

MDSR public database can be slow #58

Closed homerhanumat closed 3 years ago

homerhanumat commented 3 years ago

The MDSR public database seems to work rather slowly, for example:

mysql> select m.nameFirst as fname,
    ->        m.nameLast as lname,
    ->        sum(HR) as hr,
    ->        sum(sb) as sb,
    ->        sum(H) / sum(AB) as career_ba
    -> from Batting b
    -> inner join Master m on b.playerID = m.playerID
    -> group by b.playerID
    -> having hr >= 300 and sb >= 300
    -> order by career_ba desc;
+--------+-----------+------+------+-----------+
| fname  | lname     | hr   | sb   | career_ba |
+--------+-----------+------+------+-----------+
| Willie | Mays      |  660 |  338 |    0.3017 |
| Barry  | Bonds     |  762 |  514 |    0.2981 |
| Alex   | Rodriguez |  696 |  329 |    0.2948 |
| Carlos | Beltran   |  421 |  312 |    0.2814 |
| Andre  | Dawson    |  438 |  314 |    0.2794 |
| Steve  | Finley    |  304 |  320 |    0.2712 |
| Bobby  | Bonds     |  332 |  461 |    0.2678 |
| Reggie | Sanders   |  305 |  304 |    0.2669 |
+--------+-----------+------+------+-----------+
8 rows in set (2 min 48.69 sec)

The same query directed at the scidb.smith.edu database (used in earlier versions of mdsr) finished in 4.39 seconds.

homerhanumat commented 3 years ago

But maybe this is explained by #36:

I don't want our AWS instance to be large, or something we have to manage.

beanumber commented 3 years ago

Hmm...it shouldn't be THAT slow. I don't see any rogue queries on the server. I can rebooting it...

nicholasjhorton commented 3 years ago

I've seen similar throughput, though I was never able to characterize it. (It felt like the connection was being intentionally throttled but from what I understand from @beanumber the I/O on the AWS instance was not limited.)

homerhanumat commented 3 years ago

My queries to lahman have been slow, even fairly simple ones. Queries to airlines turn around pretty quickly. Could lahman be less indexed?

beanumber commented 3 years ago

Blah -- there are no indexes!!

Not sure how that happened, but I will fix.

beanumber commented 3 years ago

Fixed. Remind me to use this as an example of the value of indexes!

nicholasjhorton commented 3 years ago

This was a huge issue for me this summer and fall. Thanks for the fix!