housepower / spark-clickhouse-connector

Spark ClickHouse Connector build on DataSourceV2 API
https://housepower.github.io/spark-clickhouse-connector
Apache License 2.0
171 stars 59 forks source link

Column names with spaces cause error #307

Open paf91 opened 1 month ago

paf91 commented 1 month ago

Hi. I'm facing issue when columns have spaces and we're trying to filter table:

tt = spark.table("test.table1")
tt.select(col("some ID")).where(col("Some ID") == '123').count()

Translates into: SELECT COUNT(*) FROMtest.table1WHERE (1=1) AND ((1=1) AND (``some ID`` = '123') AND (1=1)) Which causes error in clickhouse because it wraps column names with `` even if we already did it:

2024.04.10 19:17:22.946555 [ 1411349 ] {d9d75824-74a9-476f-8f70-a7fe7beda392} <Error> DynamicQueryHandler: Code: 62. DB::Exception: Syntax error: failed at position 83 ('``') (line 3, col 29): ``some ID`` = '123') AND (1=1))

LIMIT 21
. Expected one of: literal, NULL, number, Bool, true, false, string literal, SELECT query, possibly with UNION, list of union elements, SELECT query, subquery, possibly with UNION, SELECT subquery, SELECT query, WITH, FROM, SELECT, EXPLAIN, token, Comma, ClosingRoundBracket, CAST operator, NOT, INTERVAL, CASE, DATE, TIMESTAMP, tuple, collection of literals, array, asterisk, qualified asterisk, compound identifier, list of elements, identifier, COLUMNS matcher, COLUMNS, qualified COLUMNS matcher, substitution, MySQL-style global variable. (SYNTAX_ERROR), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c6d5d7b in /usr/bin/clickhouse
1. DB::Exception::createDeprecated(String const&, int, bool) @ 0x000000000c72de4d in /usr/bin/clickhouse
2. DB::parseQueryAndMovePosition(DB::IParser&, char const*&, char const*, String const&, bool, unsigned long, unsigned long) @ 0x0000000012ebc03c in /usr/bin/clickhouse
3. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000117221c5 in /usr/bin/clickhouse
4. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::shared_ptr<DB::Context>, std::function<void (DB::QueryResultDetails const&)>, DB::QueryFlags, std::optional<DB::FormatSettings> const&, std::function<void (DB::IOutputFormat&)>) @ 0x0000000011729bea in /usr/bin/clickhouse
5. DB::HTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::optional<DB::CurrentThread::QueryScope>&) @ 0x0000000012616f8d in /usr/bin/clickhouse
6. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&) @ 0x000000001261bdb6 in /usr/bin/clickhouse
7. DB::HTTPServerConnection::run() @ 0x0000000012696c12 in /usr/bin/clickhouse
8. Poco::Net::TCPServerConnection::start() @ 0x00000000150f4e52 in /usr/bin/clickhouse
9. Poco::Net::TCPServerDispatcher::run() @ 0x00000000150f5c51 in /usr/bin/clickhouse
10. Poco::PooledThread::run() @ 0x00000000151ece67 in /usr/bin/clickhouse
11. Poco::ThreadImpl::runnableEntry(void*) @ 0x00000000151eb45c in /usr/bin/clickhouse
12. ? @ 0x00007f2c94dbcac3 in ?
13. ? @ 0x00007f2c94e4e850 in ?
 (version 23.12.2.59 (official build))

Meanwhile tt.show()works fine

PySpark 3.3.4 Spark Clickhouse Connector from main branch (0.8) Clikchouse JDBC Driver 0.6

paf91 commented 1 month ago

Does look like it's runtime filters problems or even the jdbc driver itself. I tried pure jdbc driver and it has same issue. https://github.com/ClickHouse/clickhouse-java/issues/1608