trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.34k stars 2.98k forks source link

redis connector table name to match exact one key #12464

Open DeoLeung opened 2 years ago

DeoLeung commented 2 years ago

Hi,

I have table_names redis.table-names=a:app,a:sys where in redis I have keys a:app:1, a:app:2, a:sys, by enabling redis.key-prefix-schema-table=true, a:sys can't match any key, but a:app do.

select count(1) from "a:app"; -- 2
select count(1) from "a:sys"; -- 0
polaris6 commented 2 years ago

In your case, redis.key-prefix-schema-table=true is set, so Trino scans Redis data via the a:sys:* wildcard, causes a:sys not to be scanned.

I think this requirement is reasonable, this situation is actually a key as a table, and we can do some work to support this requirement. I thought of two solutions:

The first solution

Similar to Iceberg's metadata file, we can support properties in the Redis table definition file. The property in the properties can be similar to tableIsAKey=true, which is not required by default and can be added by the user as needed. In this way, different scanning rules can be adapted according to the property tableIsAKey. Example of the table definition file:

{
    "tableName": "...",
    "schemaName": "...",
    "key": {
        "dataFormat": "...",
        "fields": [
        ]
    },
    "value": {
        "dataFormat": "...",
        "fields": [
       ]
    },
    "properties": {
        "tableIsAKey": "true"
    }
}

Example of the Iceberg metadata file:

{
  "format-version" : 1,
  "schemas" : [ {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "ip",
      "required" : false,
      "type" : "string"
    }, {
      "id" : 2,
      "name" : "report_time",
      "required" : false,
      "type" : "string"
    }]
  } ],
  "properties" : {
    "preserve.snapshot.days" : "3",
    "preserve.snapshot.nums" : "10",
    "write.format.default" : "parquet",
    "write.metadata.delete-after-commit.enabled" : "true",
    "...": "..."
  },
  "...": "..."
}

The second solution

It is necessary to combine the pushdown of the key in this PR: https://github.com/trinodb/trino/pull/12219

Users can write the following SQL to achieve this requirement (of course the PR needs some minor modifications).

select count(1) from "a:sys" where redis_key = "a:sys";

However, I think adding where filter conditions is a bit redundant, because the essence of this requirement is that a key is used as a table, so I prefer the first solution, adapting different scanning rules according to different property.

Hi @ebyhr, I would like to hear your suggestions, if possible, I will complete the corresponding development according to the first solution.