Open satendrakumar opened 5 years ago
I am also having issues using when using Spark Structured Streaming. I noticed the error @satendrakumar was experiencing above so I modified my code to supply a private key via the privateKey
option. It returned the following error:
ERROR IllegalArgumentException: "A snowflake passsword or private key path must be provided with 'sfpassword or pem_private_key' parameter, e.g. 'password'"
When trying to also include the pem_private_key
option, I get the following exception despite me following code examples found in the Snowflake docs:
IllegalArgumentException: 'Input PEM private key is invalid'
The streaming mode does not currently support streaming data directly from Databricks or Qubole. However, the connector still works in non-streaming mode with both Qubole and Databricks.
The streaming mode does not currently support streaming data directly from Databricks or Qubole. However, the connector still works in non-streaming mode with both Qubole and Databricks.
To be clear @rkesh-singh, you are currently using the Spark-Snowflake connector for batch writes? I am as well... but looking to use the structured streaming SnowflakeSink published here for streaming jobs. No documentation exists 😩
@andregoode Streaming support is still in preview. You can contact Snowflake for enabling in your account.
@rkesh-singh Is there any update on this? Or is it still in preview mode?
I used below method, and it worked.
I have tried it with PySpark, should also work with modifications in Spark-Scala.
Pre-requisites: Public key must be added to user in Snowflake.
Additional Libraries imported:
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import serialization
Used following code to obtain decrypted Private key without header and trailer:
private_key_obj = open(private_key_path,"r")
private_key=private_key_obj.read()
private_key_obj.close()
key = bytes(private_key, 'utf-8')
p_key = serialization.load_pem_private_key(key, password=passphrase.encode(), backend=default_backend())
pkb = p_key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption()
).replace(b"-----BEGIN PRIVATE KEY-----\n", b"") \
.replace(b"\n-----END PRIVATE KEY-----", b"") \
.decode("utf-8")
In options added 'pem_private_key' as 'pkb'.
Added some additional parameters in writeStream():
rawstream.writeStream\
.outputMode("append")\
.option("checkpointLocation", <checkpoint location>)\
.option("dbtable",<target table name>)\
.options(**options)\
.option("streaming_stage", <temp stage name>)\
.format("snowflake")\
.start().awaitTermination()
still un-support streaming read?
We are using databricks Spark to load data into snowflake. It is working perfectly with Batch jobs but failing with streaming. here is code:
Error:
Not sure, This is issue. Is this possible to load streaming data using the username and password ?