sijocherian / google-bigquery

Automatically exported from code.google.com/p/google-bigquery
0 stars 0 forks source link

String comparison making much longer than expected #245

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
This query took roughly 4.5 hours to chew through ~26GB of data: 

SELECT *
FROM [CAT_TICK.TRADE] as NBBO
JOIN EACH [bt_poc_orders.bt_poc_7_201504240736] AS ORDERS
ON NBBO.symbol = ORDERS.symbol
WHERE NBBO.symbol = 'IBM'
AND ORDERS.timestamp > '20150424-15:54:21'
AND ORDERS.timestamp < '20150424-16:54:21'
AND ORDERS.CATReporterId = 'ROQC'
AND NBBO.price > 150
AND ORDERS.avgPrice < '$200'
LIMIT 1000;

Whereas this query gets it done in under 5 seconds:

SELECT SEC_TO_TIMESTAMP(NBBO.timestamp) as UTC, ORDERS.timestamp, 
       cast(SUBSTR(ORDERS.avgPrice, 2) as float) as ORDERS_avgPrice, 
       NBBO.price AS NBBO_Price, ORDERS.symbol, ORDERS.CATReporterId
FROM [CAT_TICK.TRADE] as NBBO
JOIN EACH [bt_poc_orders.bt_poc_7_201504240736] AS ORDERS
ON NBBO.symbol = ORDERS.symbol
WHERE NBBO.symbol = 'IBM'
AND NBBO.price < 200
AND cast(SUBSTR(ORDERS.avgPrice, 2) as float) > 190
AND ORDERS.CATReporterId = 'ROQC'
AND ORDERS.timestamp > '20150424-00:00:00'
AND ORDERS.timestamp < '20150425-00:00:00'
LIMIT 1000;

I think it has something to do with the string comparison, 
but shouldn't an error be thrown if something will take this long?

Original issue reported on code.google.com by adam.naj...@sungard.com on 13 May 2015 at 3:10