nightscape / spark-excel

A Spark plugin for reading and writing Excel files
Apache License 2.0
469 stars 147 forks source link

[BUG] partitionBy not working as expected #615

Closed kumar2901 closed 2 years ago

kumar2901 commented 2 years ago

Is there an existing issue for this?

Current Behavior

I am trying to write excel file. processedData.repartition(1).write() .partitionBy(getPartitions()) .format("excel") .mode(SaveMode.Append) .option("dataAddress", sheetName) .option("useHeader", "true") .save(getSinkPath()+"/report.xlsx");

I expected to create subdirectories for each partition same as csv and create file. but now it is writing all the same file.

Expected Behavior

It should behave same as csv format.

Steps To Reproduce

processedData.repartition(1).write() .partitionBy(getPartitions()) .format("excel") .mode(SaveMode.Append) .option("dataAddress", sheetName) .option("useHeader", "true") .save(getSinkPath()+"/report.xlsx");

Environment

- Spark version: 2.4.8
- Spark-Excel version:2.4.8_0.17.1
- OS:MACOS
Apache POI: 5.2.2
poi-ooxml: 5.2.2
- Cluster environment: Googe cloud
-Java:8

Anything else?

No response

github-actions[bot] commented 2 years ago

Please check these potential duplicates:

kumar2901 commented 2 years ago

This is not Duplicate.

Someone please look into it

nightscape commented 2 years ago

@quanghgx should partitioning work in v2?

christianknoepfle commented 2 years ago

Hi, writing partitioned should work. There is a unit test that checks that (https://github.com/crealytics/spark-excel/blob/main/src/test/scala/com/crealytics/spark/v2/excel/DataFrameWriterApiComplianceSuite.scala). The actual spark integration is pretty much the same than csv, so I am wondering why it doesn't work for you

I am wondering what the function getPartitions() does. What data type does it return and in your specific case what is the returned value. What is your dataframe schema and how does the data look like? Can you provide a code snippet / test that shows the behaviour?

Thanks

Christian

kumar2901 commented 2 years ago

Here is definition of getPartition:

private String[] getPartitions() { return new String[]{"colName"}; }

Can you check if partitionBy Supported in below version:

christianknoepfle commented 2 years ago

Ah I see, this is not supported for spark 2.4.8, see the docs

"Because folders are supported you can read/write from/to a "partitioned" folder structure, just the same way as csv or parquet. Note that writing partitioned structures is only available for spark >=3.0.1"

pjfanning commented 2 years ago

Thanks @christianknoepfle - comes from https://github.com/crealytics/spark-excel#excel-api-based-on-datasourcev2

I'm going to close this because it works as expected. Feel free to reopen if there is anything else to discuss.