Currently, we have no capability of querying these columns with our queries, as all data types cast to varchar or bigint in Outer GroupBy queries.
Even when queried, this column type may not be filtered unless it is casted via UDF to String or ThetaSketch (not part of this feature)
Resolution:
Allows querying as a flag in StrType which is disabled by default.
Should generate any inner or outer casting as either the direct column type itself, or as a cast to binary type.
Should not prevent the column from being queried directly. Though the column won't be human-readable, we don't restrict this sort of raw data querying.
Prevents any direct filtering. This column type may only be filtered indirectly through other columns which derive it into new properties.
This is directly enforced in PublicFact, as we don't want users to be able to even specify Binary columns as filterable.
Below queries show behavior change in PrestoV1/HiveV2. This will make for simpler checking of when this PR is meant to "do"
Example existing query, using start_time as the Fact decoded column:
SELECT CAST(advertiser_id as VARCHAR) AS advertiser_id, CAST(mang_decoded_binary_col2 as VARCHAR) AS mang_decoded_binary_col2, CAST(mang_campaign_name as VARCHAR) AS mang_campaign_name, CAST(mang_impressions as VARCHAR) AS mang_impressions
FROM(
SELECT advertiser_id AS advertiser_id, decodeUDF(ad_group_id, 1, start_time, null) AS mang_decoded_binary_col2, mang_campaign_name AS mang_campaign_name, impressions AS mang_impressions
FROM(
SELECT COALESCE(CAST(account_id as BIGINT), 0) advertiser_id, getCsvEscapedString(CAST(COALESCE(c1.mang_campaign_name, '') AS VARCHAR)) mang_campaign_name, SUM(impressions) AS impressions, COALESCE(CAST(start_time as BIGINT), 0) start_time
FROM(SELECT account_id, campaign_id, SUM(impressions) impressions, start_time
FROM s_stats_fact_underlying
WHERE (account_id = 12345) AND (stats_date >= '2023-06-14' AND stats_date <= '2023-06-21')
GROUP BY account_id, campaign_id, start_time
)
ssfu0
LEFT OUTER JOIN (
SELECT campaign_name AS mang_campaign_name, id c1_id
FROM campaign_presto_underlying
WHERE ((load_time = '%DEFAULT_DIM_PARTITION_PREDICTATE%' ) AND (shard = 'all' )) AND (advertiser_id = 12345)
)
c1
ON
CAST(ssfu0.campaign_id AS VARCHAR) = CAST(c1.c1_id AS VARCHAR)
GROUP BY COALESCE(CAST(account_id as BIGINT), 0), getCsvEscapedString(CAST(COALESCE(c1.mang_campaign_name, '') AS VARCHAR)), COALESCE(CAST(start_time as BIGINT), 0)
ORDER BY impressions DESC) OgbQueryAlias
)
queryAlias LIMIT 200
Example new query, now decoding on the OGB Binary column:
SELECT CAST(advertiser_id as VARCHAR) AS advertiser_id, CAST(mang_decoded_binary_col as VARCHAR) AS mang_decoded_binary_col, CAST(mang_campaign_name as VARCHAR) AS mang_campaign_name, CAST(mang_impressions as VARCHAR) AS mang_impressions
FROM(
SELECT advertiser_id AS advertiser_id, decodeUDF(ad_group_id, 1, binarycol, null) AS mang_decoded_binary_col, mang_campaign_name AS mang_campaign_name, impressions AS mang_impressions
FROM(
SELECT COALESCE(CAST(account_id as BIGINT), 0) advertiser_id, getCsvEscapedString(CAST(COALESCE(c1.mang_campaign_name, '') AS VARCHAR)) mang_campaign_name, SUM(impressions) AS impressions, CAST(binarycol as BINARY) binarycol
FROM(SELECT account_id, campaign_id, SUM(impressions) impressions, binarycol
FROM s_stats_fact_underlying
WHERE (account_id = 12345) AND (stats_date >= '2023-06-14' AND stats_date <= '2023-06-21')
GROUP BY account_id, campaign_id, binarycol
)
ssfu0
LEFT OUTER JOIN (
SELECT campaign_name AS mang_campaign_name, id c1_id
FROM campaign_presto_underlying
WHERE ((load_time = '%DEFAULT_DIM_PARTITION_PREDICTATE%' ) AND (shard = 'all' )) AND (advertiser_id = 12345)
)
c1
ON
CAST(ssfu0.campaign_id AS VARCHAR) = CAST(c1.c1_id AS VARCHAR)
GROUP BY COALESCE(CAST(account_id as BIGINT), 0), getCsvEscapedString(CAST(COALESCE(c1.mang_campaign_name, '') AS VARCHAR)), CAST(binarycol as BINARY)
ORDER BY impressions DESC) OgbQueryAlias
)
queryAlias LIMIT 200
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
Issue:
Resolution:
Below queries show behavior change in PrestoV1/HiveV2. This will make for simpler checking of when this PR is meant to "do"
Example existing query, using start_time as the Fact decoded column:
Example new query, now decoding on the OGB Binary column:
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.