Closed rmar3a closed 4 years ago
Update2: It looks like everything is fine with our current code and no decrease in performance was found. Here is a breakdown of a run for Visual Genome (times are in ms):
SELECT SUM(cascadeFS), SUM(lattice), SUM(populateMQ), SUM(populateMQRChain), SUM(IFNULL(buildPVarsCounts, 0)), SUM(IFNULL(buildRChainCounts,0)), SUM(IFNULL(buildRNodeCounts, 0)), SUM(IFNULL(buildFlatStarCT, 0)) FROM CallLogs\G
*************************** 1. row ***************************
SUM(cascadeFS): 2493
SUM(lattice): 5802
SUM(populateMQ): 2436
SUM(populateMQRChain): 16249
SUM(IFNULL(buildPVarsCounts, 0)): 10328
SUM(IFNULL(buildRChainCounts,0)): 3157164
SUM(IFNULL(buildRNodeCounts, 0)): 27
SUM(IFNULL(buildFlatStarCT, 0)): 173089
Note: Almost all the time for building the counts for the RChains comes from when we are creating the global counts tables. Here is the runtime for the global counts table's RChain counts:
SELECT * FROM CallLogs LIMIT 1;
+------------+-----------+---------+------------+------------------+------------------+------------------+-------------------+------------------------+-----------------+
| CallNumber | cascadeFS | lattice | populateMQ | populateMQRChain | buildPVarsCounts | buildRNodeCounts | buildRChainCounts | createJoinTableQueries | buildFlatStarCT |
+------------+-----------+---------+------------+------------------+------------------+------------------+-------------------+------------------------+-----------------+
| 1 | 8 | 50 | 46 | 612 | NULL | NULL | 3132305 | NULL | NULL |
+------------+-----------+---------+------------+------------------+------------------+------------------+-------------------+------------------------+-----------------+
Note: I am seeing if a similar change can be applied to the "_counts" tables for RNodes/RChains.
Update: It looks like applying a similar change to the "_counts" tables for RNodes also helps (for both small and large datasets) so I've included a commit for that change as well now. I was looking at the runtimes I've collected so far, and I'm not too sure about this, but a recent change might have affected the runtime for building the global counts tables. Just running some tests to confirm this, hopefully it's nothing though.