An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
The PySpark profiling example notebook uses the collect_column_profile_views method as the first example which returns a dictionary and is not easy to then use to write to WhyLabs. We should start with the most commonlyapplicable method, as this one is a more advanced use case which requires an integration to manage column profiles rather than using the dataset profile or result set as a wrapper.
Description
The PySpark profiling example notebook uses the
collect_column_profile_views
method as the first example which returns a dictionary and is not easy to then use to write to WhyLabs. We should start with the most commonlyapplicable method, as this one is a more advanced use case which requires an integration to manage column profiles rather than using the dataset profile or result set as a wrapper.PySpark Integration