Open fleapapa opened 7 years ago
You can use select count(distinct cast(trip_distance as integer)) from trips;
or select approx_count_distinct(floor(trip_distance)) from trips;
if a good (within 4%) approximation is ok. The original query would also run with the watchdog disabled, but it'd use a slow, std::set
based implementation.
In order to make your query run unchanged, we just need to add range information for the FLOOR
function. Not hard at all, I might do it next week.
Thanks. Because query "select count(distinct CEIL(trip_distance)) from trips;" doesn't work for fast path and cast seems not a substitute for CEIL(), CEIL function also need range information. Actually MOD and most other functions need the information, too.
BTW, another minor issue is that ROUND() seems not behaving as documented:
mapdql> select count(distinct ROUND(trip_distance)) from trips; Exception: Exception occurred: org.apache.calcite.runtime.CalciteContextException: From line 1, column 23 to line 1, column 44: Invalid number of arguments to function 'ROUND'. Was expecting 2 arguments
It seems treated as ROUND_TO_DIGIT (x, y):
mapdql> select count(distinct ROUND(trip_distance,0)) from trips; Exception: Cannot use a fast path for COUNT distinct
Not sure what version you're using, we've removed ROUND
altogether in 6f2730e5a5027b5d37eedbbf65c244e700c76436.
The version is MapD Server 3.2.1-20170826-327e7ee; it's older than 5 days ago :)
Ok, that's before we've removed it. Yes, all functions you've mentioned need range information for precise count distinct using bitmap implementation.
This query generates an error as title.
The query works with other databases. What should i change to it to make it work with MapD?
Schema of trips table:
CREATE TABLE trips ( medallion TEXT ENCODING DICT, hack_license TEXT ENCODING DICT, vendor_id TEXT ENCODING DICT, rate_code_id SMALLINT, store_and_fwd_flag TEXT ENCODING DICT, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP,
passenger_count SMALLINT, trip_time_in_secs INTEGER, trip_distance DECIMAL(14,2), pickup_longitude DECIMAL(14,2), pickup_latitude DECIMAL(14,2), dropoff_longitude DECIMAL(14,2), dropoff_latitude DECIMAL(14,2) ) WITH (FRAGMENT_SIZE=75000000);