yahoo / maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Apache License 2.0
129 stars 57 forks source link

Add Druid Lookup with user-modifiable missing value override #1025

Closed ryankwagner closed 1 year ago

ryankwagner commented 1 year ago

Currently, col additions will always return MAHA_LOOKUP_EMPTY if the inner decoded column isn't there.
This will allow users to create their own overrides so that requests can filter on expected values like nulls or empty Strings.

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

mcjyang commented 1 year ago

Just try to understand the use case here, with this new function your lookup values will either get decoded to either columnIfMatched or columnIfNotMatched, and if the chosen column is null we want to replace them with some user-desired value?

Also if this is the case have you tried to hit the druid with the generated druid query? to make sure the behavior is as expected

ryankwagner commented 1 year ago

Just try to understand the use case here, with this new function your lookup values will either get decoded to either columnIfMatched or columnIfNotMatched, and if the chosen column is null we want to replace them with some user-desired value?

Also if this is the case have you tried to hit the druid with the generated druid query? to make sure the behavior is as expected

Yes. The current query will return MAHA_LOOKUP_EMPTY in any returned rows, ex:

{
        "version": "v1",
        "timestamp": "2023-03-15T00:00:00.000Z",
        "event": {
            "School ID": "1474388",
            "Class ID": "-1",
            "Class Name": "MAHA_LOOKUP_EMPTY",
            "School Name": "Lake",
            "Impressions": 10,
            "Day": "20230315"
        }
    }

With the fix giving:

{
        "version": "v1",
        "timestamp": "2023-03-15T00:00:00.000Z",
        "event": {
            "School ID": "1474388",
            "Class ID": "-1",
            "Class Name": null,
            "School Name": "Lake",
            "Impressions": 10,
            "Day": "20230315"
        }
    }

Where the replacement specified was ""