Closed swarupsarangi113 closed 2 years ago
Hi @swarupsarangi113 https://github.com/housepower/spark-clickhouse-connector is recommended for Spark 3.3 users
And your error message indicates you are trying to insert NULL to a String column that does not accept NULL values.
"type[String] doesn't support null value"
Hi @pan3793 , I am using Spark 3.3.0 in my system. Could you explain a little bit about your resolution. I couldn't understanding since I am little new to this.
ClickHouse has its own type system, String
means STRING NOT NULL
, and Nullable(String)
mean STRING
which accepts NULL
value.
OK, I missed your mention that
dynamically created while writing dataframe
It's the limitation of Spark API, it does not expose the dataframe nullable to the JDBC parts, so the developer just always creates a nullable or not null schema
please pre-create the clickhouse table before writing
@pan3793 How to create an empty table with my custom DDL programmatically ?
I tried something below like this:
def create_table(self):
self.spark.sql(
"""
CREATE TABLE clickhousedb.Prospect_Base
(
ProspectID CHAR(50) not null ,
FirstName CHAR(50) null,
LastName CHAR(30) null
)
ENGINE = MergeTree()
PRIMARY KEY ProspectID
ORDER BY ProspectID
SETTINGS index_granularity = 8192;
"""
)
But it is throwing error
java.lang.NoSuchMethodError: 'java.lang.Object
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(org.antlr.v4.runtime.ParserRuleContext, scala.Function0)'
at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:244)
at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:146)
at io.delta.sql.parser.DeltaSqlBaseParser$SingleStatementContext.accept(DeltaSqlBaseParser.java:165)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at io.delta.sql.parser.DeltaSqlParser.$anonfun$parsePlan$1(DeltaSqlParser.scala:74)
at io.delta.sql.parser.DeltaSqlParser.parse(DeltaSqlParser.scala:103)
at io.delta.sql.parser.DeltaSqlParser.parsePlan(DeltaSqlParser.scala:73)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:577)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
Is this due to driver incompatibility with delta jar packages ?
I don't think the Spark JDBC datasource provides the ability to run native query.
The spark-clickhouse-connector has such ability.
I was having trouble with connecting to clickhouse from the above connector. Probably something from backend side which I have no control of.
meanwhile, I identified the column which was having null values and replaced them with empty strings using df.na.fill("", ["source"])
@pan3793 do you think this type of approach can dismiss the need for the clickhouse-connector for now or will it have other disadvantages ?
I was having trouble with connecting to clickhouse from the above connector.
The connector is actively maintained now, you can open issues for specific questions, and you are free to choose the tech stack based on your decision.
... will it have other disadvantages?
If it meets your requirements, it's the best option.
Environment
Issue
I am trying to load a dataframe into clickhouse table by using the below code snippet:
I believe the clickhouse table that is dynamically created while writing dataframe, it is not accepting the columns that are having null values. Is there any way to resolve this issue.
Error logs