housepower / spark-clickhouse-connector

Spark ClickHouse Connector build on DataSourceV2 API
https://housepower.github.io/spark-clickhouse-connector
Apache License 2.0
171 stars 59 forks source link

problems with table creation #308

Closed Smalch closed 1 week ago

Smalch commented 1 month ago

https://github.com/housepower/spark-clickhouse-connector/blob/8f970f52cd29b349210c859c11c462863a440d3c/spark-3.5/clickhouse-spark/src/main/scala/xenon/clickhouse/ClickHouseCatalog.scala#L120

when i try to create a table, for instance like this(but i've tried many different versions)

spark.sql("create table geo.nearest_towers33 as select * from test2")

it returns that it can't create table, because the table doesn't exist. I think that the error is in that part of code.

Py4JJavaError                             Traceback (most recent call last)
Cell In[28], [line 93](vscode-notebook-cell:?execution_count=28&line=93)
     [89](vscode-notebook-cell:?execution_count=28&line=89) df = spark.sql(query)
     [90](vscode-notebook-cell:?execution_count=28&line=90) df.createOrReplaceTempView("test2")
---> [93](vscode-notebook-cell:?execution_count=28&line=93) spark.sql("create table geo.nearest_towers33 as select * from test2")

File /opt/spark/python/pyspark/sql/session.py:1440, in SparkSession.sql(self, sqlQuery, args, **kwargs)
   [1438](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/sql/session.py:1438) try:
   [1439](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/sql/session.py:1439)     litArgs = {k: _to_java_column(lit(v)) for k, v in (args or {}).items()}
-> [1440](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/sql/session.py:1440)     return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
   [1441](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/sql/session.py:1441) finally:
   [1442](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/sql/session.py:1442)     if len(kwargs) > 0:

File /opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   [1316](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1316) command = proto.CALL_COMMAND_NAME +\
   [1317](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1317)     self.command_header +\
   [1318](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1318)     args_command +\
   [1319](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1319)     proto.END_COMMAND_PART
   [1321](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1321) answer = self.gateway_client.send_command(command)
-> [1322](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322) return_value = get_return_value(
   [1323](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1323)     answer, self.gateway_client, self.target_id, self.name)
   [1325](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1325) for temp_arg in temp_args:
   [1326](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1326)     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
    [167](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/errors/exceptions/captured.py:167) def deco(*a: Any, **kw: Any) -> Any:
    [168](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/errors/exceptions/captured.py:168)     try:
--> [169](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/errors/exceptions/captured.py:169)         return f(*a, **kw)
    [170](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/errors/exceptions/captured.py:170)     except Py4JJavaError as e:
    [171](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/pyspark/errors/exceptions/captured.py:171)         converted = convert_exception(e.java_exception)

File /opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    [324](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:324) value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    [325](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:325) if answer[1] == REFERENCE_TYPE:
--> [326](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326)     raise Py4JJavaError(
    [327](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:327)         "An error occurred while calling {0}{1}{2}.\n".
    [328](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:328)         format(target_id, ".", name), value)
    [329](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:329) else:
    [330](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:330)     raise Py4JError(
    [331](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:331)         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    [332](https://vscode-remote+ssh-002dremote-002bairflow.vscode-resource.vscode-cdn.net/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:332)         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o60.sql.
: xenon.clickhouse.exception.CHServerException: [HTTP]default@35.224.242.161:8123}/default [1002] {"exception": "Code: 60. DB::Exception: Unknown table expression identifier 'geo.nearest_towers33' in scope SELECT * FROM geo.nearest_towers33 WHERE 1 = 0. (UNKNOWN_TABLE) (version 24.3.2.23 (official build))"}
, server ClickHouseNode [uri=http://35.224.242.161:8123/default]@-1160813478
    at xenon.clickhouse.client.NodeClient.syncQuery(NodeClient.scala:141)
    at xenon.clickhouse.client.NodeClient.syncQueryOutputJSONEachRow(NodeClient.scala:62)
    at xenon.clickhouse.ClickHouseCatalog.loadTable(ClickHouseCatalog.scala:123)
    at xenon.clickhouse.ClickHouseCatalog.loadTable(ClickHouseCatalog.scala:36)
    at org.apache.spark.sql.connector.catalog.TableCatalog.tableExists(TableCatalog.java:163)
    at xenon.clickhouse.ClickHouseCatalog.tableExists(ClickHouseCatalog.scala:36)
    at org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:81)
    at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
    at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
    at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:662)
    at jdk.internal.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
    at java.base/java.lang.reflect.Method.invoke(Method.java:578)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: com.clickhouse.client.ClickHouseException: {"exception": "Code: 60. DB::Exception: Unknown table expression identifier 'geo.nearest_towers33' in scope SELECT * FROM geo.nearest_towers33 WHERE 1 = 0. (UNKNOWN_TABLE) (version 24.3.2.23 (official build))"}
, server ClickHouseNode [uri=http://35.224.242.161:8123/default]@-1160813478
    at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168)
    at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    ... 1 more
Caused by: java.io.IOException: {"exception": "Code: 60. DB::Exception: Unknown table expression identifier 'geo.nearest_towers33' in scope SELECT * FROM geo.nearest_towers33 WHERE 1 = 0. (UNKNOWN_TABLE) (version 24.3.2.23 (official build))"}

    at com.clickhouse.client.http.HttpUrlConnectionImpl.checkResponse(HttpUrlConnectionImpl.java:184)
    at com.clickhouse.client.http.HttpUrlConnectionImpl.post(HttpUrlConnectionImpl.java:227)
    at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:124)
    at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161)
    at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273)
    ... 4 more
dolfinus commented 2 weeks ago

Probably duplicate of #286

pan3793 commented 1 week ago

close as duplicated