This query took roughly 4.5 hours to chew through ~26GB of data:
SELECT *
FROM [CAT_TICK.TRADE] as NBBO
JOIN EACH [bt_poc_orders.bt_poc_7_201504240736] AS ORDERS
ON NBBO.symbol = ORDERS.symbol
WHERE NBBO.symbol = 'IBM'
AND ORDERS.timestamp > '20150424-15:54:21'
AND ORDERS.timestamp < '20150424-16:54:21'
AND ORDERS.CATReporterId = 'ROQC'
AND NBBO.price > 150
AND ORDERS.avgPrice < '$200'
LIMIT 1000;
Whereas this query gets it done in under 5 seconds:
SELECT SEC_TO_TIMESTAMP(NBBO.timestamp) as UTC, ORDERS.timestamp,
cast(SUBSTR(ORDERS.avgPrice, 2) as float) as ORDERS_avgPrice,
NBBO.price AS NBBO_Price, ORDERS.symbol, ORDERS.CATReporterId
FROM [CAT_TICK.TRADE] as NBBO
JOIN EACH [bt_poc_orders.bt_poc_7_201504240736] AS ORDERS
ON NBBO.symbol = ORDERS.symbol
WHERE NBBO.symbol = 'IBM'
AND NBBO.price < 200
AND cast(SUBSTR(ORDERS.avgPrice, 2) as float) > 190
AND ORDERS.CATReporterId = 'ROQC'
AND ORDERS.timestamp > '20150424-00:00:00'
AND ORDERS.timestamp < '20150425-00:00:00'
LIMIT 1000;
I think it has something to do with the string comparison,
but shouldn't an error be thrown if something will take this long?
Original issue reported on code.google.com by adam.naj...@sungard.com on 13 May 2015 at 3:10
Original issue reported on code.google.com by
adam.naj...@sungard.com
on 13 May 2015 at 3:10