great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.92k stars 1.54k forks source link

Light Weight Install #2671

Closed JeroenSchmidt closed 1 week ago

JeroenSchmidt commented 3 years ago

Is your feature request related to a problem? Please describe. In some deployment strategies there is no need to have the Jupiter dependencies installed. Having the Jupiter (and associated dependencies) installed ends up prolonging the deployment cycle.

Describe the solution you'd like Have a "lite" whl available on pip/conda

eugmandel commented 3 years ago

@JeroenSchmidt Thank you for posting this suggestion! This sounds like a more general case of https://github.com/great-expectations/great_expectations/issues/2688.

Marking this "in investigation", since this is an open discussion.

eugmandel commented 3 years ago

@jcampbell Please add you thoughts on the substance of this suggestion and its priority. Thank you!

JeroenSchmidt commented 3 years ago

@JeroenSchmidt Thank you for posting this suggestion! This sounds like a more general case of https://github.com/great-expectations/great_expectations/issues/2688.

Marking this "in investigation", since this is an open discussion.

Yes that's correct. In fact the other issue you referenced mentions almost the same use case I am dealing with.

twcurrie commented 3 years ago

@JeroenSchmidt Circling back to my question in the other issue - do you have a preferred strategy?

JeroenSchmidt commented 3 years ago

@twcurrie I think that the pip install that omits ipywidgets should be the optional install given that great_expecations is meant to be used interactively. I foresee it causing headaches for new users if the default pip install omitted Jupiter.

We would prefer; pip install great_expectations[no-ipywidgets] or pip install great_expectations[thin-client]

Our use case is less concerned with the dependency issue that is encountered by #2688 and more concerned with creating small and quick deployment cycles for our applications that use great_expectations but do not need the whole Jupiter suit.

twcurrie commented 3 years ago

@JeroenSchmidt Makes sense, and I agree with the comment of the headaches - exactly why I've been trying to solicit the great_expectations team for an opinion. Thank you for sharing your team's perspective.

What are some of the other dependencies that your team has issues with or sees as heavy-weight?

bradgwest commented 2 years ago

I have limited experience developing expectations, but as someone who runs production infrastructure which leverages GE, I'm quite surprised this issue has (had) zero upvotes and is not more widely referenced. It seems like an oversight not to have an interactive/development installation and a production/server installation which excludes dependencies required to create expectations, i.e. is limited to only the set of dependencies required to run expectations. Those are two very different workflows, the former occurring on a developer's machine, the latter running on the server. Separating those dependency suites would have immense benefits for users running GE in production environments, including reducing the vulnerability surface by drastically reducing the number of transitive dependencies.

JeroenSchmidt commented 2 years ago

@twcurrie I don't have a concrete list that I can share but the comment by @bradgwest touches very well on what I mentioned earlier.

MatthiasRoels commented 1 year ago

As @bradgwest already mentioned earlier, having the option to exclude the whole Ipython/Jupyter suite of dependencies from great-expectations would be very helpful if you want to use great-expectations in production (e.g. as part of an Airflow job). For the simple reason that we reduce the number of (Python) dependencies. This has the advantages that:

  1. we get smaller container images when running jobs on Kubernetes (faster downloads of container images etc)
  2. We have a smaller attack surface for vulnerability which means it is more likely that GE is allowed to be used in heavily regulated industry (banking, pharma, ...) from a compliance point of view.

My suggestion (and that also seems like the simplest) would be to make all Jupyter dependencies optional. Basically to install them as

pip install "great_expectations[jupyter]"

This could be a simple PR that only requires changes to setup.py and requirements.txt

TheEverlastingBish commented 1 year ago

👍🏽 to this issue.

I would also prefer the "normal" installation to install everything but have the option for a lite installer like pip install great_expectations[slim] or pip install great_expectations[lite] ... that does not install any of the Jupyter dependencies.

My use case is when I install GX for GitHub Actions CI/CD. It only runs a few checks, re-builds the Data Docs, etc. so it does not need a full fat GX installation.

lucascott commented 1 year ago

Given that extra dependencies <packagename>[<extra>] are only adding dependencies to the main set of dependencies (the one in requirements.txt) (see Setuptools docs) it wouldn't be possible to drop dependencies by creating an extra dependency target.

Hence, the two approaches I see to move forward on this are:

  1. Clean approach: Release a major version update to simplify the core set of dependencies and offering optional extra targets. This would introduce breaking changes which likely have an impact of those users/processes which expect to have (almost) everything bundled in as the current state.
  2. Conservative approach: Release a new separate "skinny" library of GX focused on core capabilities for production servers only. This approach has been adopted by MLflow, another popular open-source framework (skinny install, skinny docs)
bkkkk commented 1 year ago

Just adding my two cents. We are working to leverage GE together with DataDog for monitoring of data quality issues and alerting and having Jupyter dependencies are not needed.

molliemarie commented 1 month ago

Hi @JeroenSchmidt. With the upcoming launch of Great Expectations Core (GX 1.0), we are closing old issues posted regarding previous versions. Moving forward, we will focus our resources on supporting and improving GX Core (version 1.0 and beyond). If you find that an issue you previously reported still exists in GX Core, we encourage you to resubmit it against the new version. With more resources dedicated to community support, we aim to tackle new issues swiftly. For specific details on what is GX-supported vs community-supported, you can reference our integration and support policy.

To get started on your transition to GX Core, check out the GX Core quickstart (click “Full example code” tab to see a code example).

You can also join our upcoming community meeting on August 28th at 9am PT (noon ET / 4pm UTC) for more updates on GX Core. Go to https://greatexpectations.io/meetup and click “follow calendar” to follow the GX community calendar.

Thank you for being part of the GX community and thank you for submitting this issue. We're excited about this new chapter and look forward to your feedback on GX Core. 🤗

Kilo59 commented 1 week ago

@JeroenSchmidt @bradgwest @bkkkk @lucascott @MatthiasRoels @twcurrie We removed ipython as a required dependency as part of the 1.1.1 release. https://github.com/great-expectations/great_expectations/releases/tag/1.1.1

Also if you haven't upgraded to 1.0 yet it's worth noting that standard workflows of getting started no longer include jupyter notebook based workflows. You can certainly continue to run GX inside jupyter or any other ipython based notebooks. https://docs.greatexpectations.io/docs/core/introduction/try_gx/