memsql / singlestore-spark-connector

A connector for SingleStore and Spark
Apache License 2.0
160 stars 54 forks source link

Truncate Column Rename Step #93

Open yannistze opened 1 week ago

yannistze commented 1 week ago

Hello,

If I understand the order of operations in the SQLPushDownRule.scala file correctly, the first step after adding the shared context to all the Relations is to rename all the columns in each Relation to unique names following the normalizedExprIdMap logic: https://github.com/memsql/singlestore-spark-connector/blob/8e70dffe5a9dfd5b2bf6c8fb856e035b5a8c9825/src/main/scala/com/singlestore/spark/SQLPushdownRule.scala#L41-L45

This makes sense, to avoid Duplicate Name issues later on for example in Joins since they are wrapped around a selectAll statement instead of a select statement with aliases: https://github.com/memsql/singlestore-spark-connector/blob/8e70dffe5a9dfd5b2bf6c8fb856e035b5a8c9825/src/main/scala/com/singlestore/spark/SQLGen.scala#L470 https://github.com/memsql/singlestore-spark-connector/blob/8e70dffe5a9dfd5b2bf6c8fb856e035b5a8c9825/src/main/scala/com/singlestore/spark/SQLGen.scala#L488 https://github.com/memsql/singlestore-spark-connector/blob/8e70dffe5a9dfd5b2bf6c8fb856e035b5a8c9825/src/main/scala/com/singlestore/spark/SQLGen.scala#L502

The issue that I am facing with this approach is that:

the query string becomes too long and thus leading to the schema fetch and subsequent PrepareStatement code to fail.

Tried:

I was wondering if there is any guidance in scenarios like the one I am facing ?

Thanks

AdalbertMemSQL commented 3 days ago

Hello, Could you clarify what error you are encountering?

yannistze commented 3 days ago

Hello, Could you clarify what error you are encountering?

Hello, for sure, the error I get is the following generic one:

java.sql.SQLTransientConnectionException: Driver has reconnect connection after a communications link failure with address=(host=10.133.121.176)(port=3306)(type=primary)
  at com.singlestore.jdbc.client.impl.MultiPrimaryClient.replayIfPossible(MultiPrimaryClient.java:212)
  at com.singlestore.jdbc.client.impl.MultiPrimaryClient.execute(MultiPrimaryClient.java:345)
  at com.singlestore.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:69)
  at com.singlestore.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:251)
  at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
  at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
  at com.singlestore.spark.JdbcHelpers$.loadSchema(JdbcHelpers.scala:137)
  at com.singlestore.spark.SinglestoreReader.schema$lzycompute(SinglestoreReader.scala:84)
  at com.singlestore.spark.SinglestoreReader.schema(SinglestoreReader.scala:84)
...

that if I am not mistaken comes from this codepath in the JDBC Driver:

        // no transaction, but connection is now up again.
        // changing exception to SQLTransientConnectionException
        throw new SQLTransientConnectionException(
            String.format(
                "Driver has reconnect connection after a communications link failure with %s",
                oldClient.getHostAddress()),
            "25S03");

and "masks" the root cause 😞