housepower / ClickHouse-Native-JDBC

ClickHouse Native Protocol JDBC implementation
https://housepower.github.io/ClickHouse-Native-JDBC/
Apache License 2.0
523 stars 145 forks source link

Support LowCardinality series datatype in getMetadata().getColumns() function #442

Closed geniusjoe closed 11 months ago

geniusjoe commented 1 year ago

Environment

Error logs

I encountered below exception when my code execute connection.getMetaData().getColumns jdbc function.

Exception in thread "main" java.sql.SQLException: Unknown data type: LowCardinality
    at com.github.housepower.misc.Validate.isTrue(Validate.java:41)
    at com.github.housepower.data.DataTypeFactory.get(DataTypeFactory.java:104)
    at com.github.housepower.data.DataTypeFactory.get(DataTypeFactory.java:66)
    at com.github.housepower.jdbc.ClickHouseDatabaseMetadata.getColumns(ClickHouseDatabaseMetadata.java:912)
    at examples.SimpleQuery.main(SimpleQuery.java:30)

Steps to reproduce

Create clickhouse table

Execute below ddl statement in default database to create a table with a LowCardinality(String) datatype.

CREATE TABLE test1 
( 
    user_id UInt32, 
    message LowCardinality(String)
) 
ENGINE = MergeTree() 
order by tuple()

Insert data

Insert some rows into this table

INSERT INTO test1 VALUES (1, 'a') ;
INSERT INTO test1 VALUES (2, 'b') ;
INSERT INTO test1 VALUES (3, 'c') ;

Create java demo

Create a java demo to test getColumn() function, and above exception occurs when running the demo.

public class SimpleQuery {

    public static void main(String[] args) throws Exception {
        try (Connection connection = DriverManager.getConnection("jdbc:clickhouse://<ip>:<port>?client_name=ck-example")) {
            ResultSet columnsMetadata = connection.getMetaData().getColumns(null, "default", "test1", null);
            while (columnsMetadata.next()) {
                System.out.println("DATA_TYPE" + "\t" + JDBCType.valueOf(columnsMetadata.getInt("DATA_TYPE")).getName());
                System.out.println("COLUMN_NAME" + "\t" + columnsMetadata.getString("COLUMN_NAME"));
                System.out.println("DECIMAL_DIGITS" + "\t" + columnsMetadata.getString("DECIMAL_DIGITS"));
                System.out.println("IS_NULLABLE" + "\t" + columnsMetadata.getString("IS_NULLABLE"));
                System.out.println("IS_AUTOINCREMENT" + "\t" + columnsMetadata.getString("IS_AUTOINCREMENT"));
                System.out.println("COLUMN_DEF" + "\t" + columnsMetadata.getString("COLUMN_DEF"));
                System.out.println("COLUMN_SIZE" + "\t" + columnsMetadata.getString("COLUMN_SIZE"));
                System.out.println();
            }
        }
    }
}

Other descriptions

I think this exception is because SQLLexer.bareWord() function will parse first alphabetic word as datatype. When given a data type LowCardinality(String), then for-loop iteration in the bareWord function will encounter first left bracket and return. So that LowCardinality will be regarded as dataTypeName in IDataType<?, ?> get function, and cannot be parsed as known data type.

214e194d9020 :) SELECT database, table, name, type, default_kind as default_type, default_expression FROM system.columns where database LIKE '%default%' and table like '%test1%'; 

SELECT
    database,
    table,
    name,
    type,
    default_kind AS default_type,
    default_expression
FROM system.columns
WHERE (database LIKE '%default%') AND (table LIKE '%test1%')

Query id: 38741dd5-06d0-42aa-8199-970d498a0f92

┌─database─┬─table─┬─name────┬─type───────────────────┬─default_type─┬─default_expression─┐
│ default  │ test1 │ user_id │ UInt32                 │              │                    │
│ default  │ test1 │ message │ LowCardinality(String) │              │                    │
└──────────┴───────┴─────────┴────────────────────────┴──────────────┴────────────────────┘

2 rows in set. Elapsed: 0.002 sec. 

I think we can add a LowCardinality param check in public static IDataType<?, ?> get function. We can check if current parsed dataTypeName is LowCardinality. If so, we can make pos cursor forward and return inner data type recursively. Examples below:

    public static IDataType<?, ?> get(SQLLexer lexer, NativeContext.ServerContext serverContext) throws SQLException {
        String dataTypeName = String.valueOf(lexer.bareWord());

        if (dataTypeName.equalsIgnoreCase("Tuple")) {
           ...
        } else if (dataTypeName.equalsIgnoreCase("Array")) {
           ...
        }  ...
        else if (dataTypeName.equalsIgnoreCase("LowCardinality")){
            Validate.isTrue(lexer.character() == '(');
            IDataType<?, ?> nestedDataType = DataTypeFactory.get(lexer, serverContext);
            Validate.isTrue(lexer.character() == ')');
            return nestedDataType;
        } else {
            IDataType<?, ?> dataType = dataTypes.get(dataTypeName.toLowerCase(Locale.ROOT));
            Validate.isTrue(dataType != null, "Unknown data type: " + dataTypeName);
            return dataType;
        }
    }

Are there any suggestions? And if this issue is reasonable, I can also make a PR.

geniusjoe commented 1 year ago

@pan3793 Hello pan. Although most of this metadata related code was written by sundy-li, I feel that he is currently quite busy. So, could you please help me review this portion of the code?