sodadata / soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
https://go.soda.io/core-docs
Apache License 2.0
1.92k stars 209 forks source link

Error: You have not configured Soda Library to work with Soda Cloud #2181

Closed edkry closed 4 weeks ago

edkry commented 1 month ago
df.createOrReplaceTempView("my_df")
scan = Scan()
scan.set_verbose(True)
scan.add_spark_session(spark_session=spark, data_source_name="my_df")
scan.set_data_source_name("my_df")
scan.set_scan_definition_name("YOUR_SCHEDULE_NAME")
check = """
checks for my_df:
  - row_count > 0
"""
scan.add_sodacl_yaml_str(check)
scan.execute()

I am running this code snippet and I am running into an error: You have not configured Soda Library to work with Soda Cloud

I want to use only soda-core, however I can't find a way how to achieve with Spark Dataframes. This is the closest I have got following documentation but it gives me an error that I have to use Soda Cloud.

Is there a way to validate Spark Dataframe using Soda Core?

tools-soda commented 1 month ago

CLOUD-8806

benjamin-pirotte commented 1 month ago

Hi, have you installed soda-core-spark-df or soda-spark-df? This can be useful: https://github.com/sodadata/soda-core/blob/main/docs/installation.md

edkry commented 4 weeks ago

Yes, I did install it. I have not found any documentation in Git how to use Soda with Spark though. It has some documentation https://docs.soda.io but seems outdated.