ytsaurus / ytsaurus-spyt

YTsaurus SPYT provides an integration with Apache Spark
Apache License 2.0
12 stars 5 forks source link

Error in pyspark operation while reading arbitrary file from cypress. #25

Closed ypodlesov closed 1 week ago

ypodlesov commented 1 month ago

Occurs in spark 3.3.x

Error:

py4j.protocol.Py4JJavaError: An error occurred while calling o132.showString.
: java.lang.NoSuchMethodError: 'void org.apache.spark.sql.execution.datasources.PartitionedFile.<init>(org.apache.spark.sql.catalyst.InternalRow, java.lang.String, long, long, java.lang.String[])'
    at org.apache.spark.sql.v2.YtFilePartition$.splitFiles(YtFilePartition.scala:144)
    at org.apache.spark.sql.execution.PartitionedFileUtil$.splitFiles(PartitionedFileUtil.scala:23)

Pyspark operation:

spark.read.format("csv")\
        .option("header", "false")\
        .option("multiLine", "true")\
        .option("charset", "LATIN1")\
        .option("inferSchema" , "true")\
        .option("escape", '"')\
        .option("recursiveFileLookup", "false")\
        .option("lineSep", "\n")\
        .load('yt:///any.csv')\
        .save('yt:///file.out')
alextokarew commented 1 week ago

Fixed here: https://github.com/ytsaurus/ytsaurus-spyt/commit/0a8c584252444b69bf58e26c74e856583f60dfec, will be available in SPYT 2.4.0 release