Closed Zulpitch closed 5 years ago
I guess one would have to write Python code which wraps the Scala code. Here e.g. is MLlib's Python code: https://github.com/apache/spark/tree/master/python/pyspark/mllib I don't have any time to invest in this though, but I'd review a PR if you or somebody else writes one.
First, thank you for your quick answer.
So I guess we can't use Python with your library.
I'm going to test with in a Scala cell Databricks.
I'd love to have the needed skill to write a PR about it on your library :(
Do you think that I can find any other solution to import an excel file without using your library and the pandas.read-excel() method ?
You could have a look at https://github.com/ZuInnoTe/hadoopoffice It's more general than spark-excel but I have no idea how to use it with Python.
Ok, i'm going to check it ! I think we can put this thread to resolved, since this option will be my only one left. Thank you again for your quick answers
Aware this has already been closed but for future reference you can read from python at least (haven't tried the writer yet) with something like:
path = "a path"
df = spark.read.format("com.crealytics.spark.excel").option("useHeader", True).load(path)
No import required, simply installing the library to make it available to spark in Databricks is enough!
Really ? Going to try that INSTANT !
Just tried.
testDf = spark.read.format("com.crealytics.spark.excel").option("useHeader", True).load(fileTest)
display(testDf)
gives me the following exception :
No value for dfs.adls.oauth2.access.token.provider found in conf file.
So i guess i don't have the rights to access the file, but if i manage to create the config to access it, i guess it should work ?
By the way, if i do :
testDf = spark.read.csv(fileTest)
display(testDf)
It can get my file in my Azure Data Lake (but not really well, because it's not a CSV), and i don't need more rights, because i've already done the configuration (but can't show it to you for obvious security reasons)
Strange, isn't it ?
Strange that it works for one and not the other, can't comment beyond that though as I'm on AWS rather than Azure.
Azure Databricks probably doesn't have the same engine / work system, so it doesn't get the same informations from the library, i don't know ... Or maybe it works, and i just need additionnal configuration. Thanks a lot for your answer thought ! It's always good to be helped :)
Problem resolved !
The problem was with the Credentials. I was doing like that :
#Configuration to read from Azure Data Lake Storage
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "****")
spark.conf.set("dfs.adls.oauth2.credential", "****")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/****/oauth2/token")
#Get the path to the folder of the Azure Data Lake Storage
adlsAccountName = "****";
projectPath = "****";
projectFullPath = "adl://" + adlsAccountName + ".azuredatalakestore.net/" + projectPath;
It was working for spark.read.csv() or things like that.
How i made it work :
# Mount point
udbRoot = "****"
configs = {
"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": "****",
"dfs.adls.oauth2.credential": "****",
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/****/oauth2/token"
}
# unmount
#dbutils.fs.unmount(udbRoot)
# Mounting
dbutils.fs.mount(
source = "adl://****",
mount_point = udbRoot,
extra_configs = configs
)
This way, it worked, and the code line you gave me
testDf = spark.read.format("com.crealytics.spark.excel").option("useHeader", True).load(fileTest)
display(testDf)
worked successfully.
Thank you again, we can close the thread definitely this time :)
Did you try: from xyz import abc
? I've seen that work in cases where the names have weird characters, etc.
I see that a feature to get the sheet names dynamically from an excel file was added to this library. Is there a way to use that feature in python (on Azure Databricks)?
Hello Everybody,
I'm using Azure Databricks, with Python and i'm trying to use your library "Spark-Excel".
I have added your library on my cluster :
Problem : i don't know what import i have to write to use it. I tried several things :
import spark-excel import spark.excel import spark_excel import spark_excel_2_11_0_12_0 import com.crealytics.spark.excel
Nothing worked, and i can't use the methods linked to this library.
I haven't found an example in Python to give me the code line of code.
Hope you can help me on that question.
Thank you a lot !