great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.96k stars 1.54k forks source link

Incorporating Six Sigma Methodology for Data Quality Control in Great Expectations #9674

Closed vlasvlasvlas closed 2 months ago

vlasvlasvlas commented 7 months ago

Is your feature request related to a problem? Please describe. Currently, there is no explicit support or mention of using Six Sigma methodology within Great Expectations for quality assurance purposes. This makes it challenging for users who wish to apply Six Sigma principles to their data quality control processes.

Describe the solution you'd like I would like to see built-in support or documentation in Great Expectations for implementing Six Sigma methodology to assess and monitor data quality. This could include guidance on defining expectations, calculating defect rates, and interpreting results in terms of Six Sigma levels.

Describe alternatives you've considered One alternative would be to manually implement Six Sigma calculations outside of Great Expectations, but this would be less integrated and less automated.

Additional context By incorporating Six Sigma support into Great Expectations, users would have a comprehensive toolset for managing data quality, aligned with industry-standard quality control practices. This would enhance the utility and versatility of Great Expectations for a wider range of users and use cases.

Example For instance, let's say we have a dataset representing customer orders in an e-commerce platform. We define expectations within Great Expectations to ensure that order timestamps are within a reasonable range, order amounts are non-negative, and customer addresses are valid. After running these expectations, we calculate a Six Sigma value based on the defect rates found in the data.

Suppose the resulting Six Sigma value is 3.5. This indicates that our data quality is reasonably good, with a defect rate of approximately 233 defects per million opportunities. Over time, as we continue to refine our data pipelines and improve data quality, we aim to see the Six Sigma value increase, indicating fewer defects and higher data quality. By monitoring this value regularly, we can track the effectiveness of our data quality improvement efforts and ensure that our data processes are meeting the desired quality standards.

Related links: https://docs.oracle.com/cd/B31080_01/doc/owb.102/b28223/concept_data_quality.htm

molliemarie commented 2 months ago

Hello @vlasvlasvlas. With the launch of Great Expectations Core (GX 1.0), we are closing old issues posted regarding previous versions. Moving forward, we will focus our resources on supporting and improving GX Core (version 1.0 and beyond). If you find that an issue you previously reported still exists in GX Core, we encourage you to resubmit it against the new version. With more resources dedicated to community support, we aim to tackle new issues swiftly. For specific details on what is GX-supported vs community-supported, you can reference our integration and support policy.

To get started on your transition to GX Core, check out the GX Core quickstart (click “Full example code” tab to see a code example).

You can also join our upcoming community meeting on August 28th at 9am PT (noon ET / 4pm UTC) for a comprehensive rundown of everything GX Core, plus Q&A as time permits. Go to https://greatexpectations.io/meetup and click “follow calendar” to follow the GX community calendar.

Thank you for being part of the GX community and thank you for submitting this issue. We're excited about this new chapter and look forward to your feedback on GX Core. 🤗