vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.17k stars 782 forks source link

Opt in or opt out on vegafusion dependencies #3309

Open mattijn opened 7 months ago

mattijn commented 7 months ago

Opening this issue to discuss and collect insights.

Currently we have an altair package that relies on a few hard dependencies:

dependencies = [
    "typing_extensions>=4.0.1; python_version<\"3.11\"",
    "jinja2",
    "jsonschema>=3.0",
    "numpy",
    "pandas>=0.25",
    "toolz",
    "packaging"
]

In parallel efforts we are trying to reduce these hard dependencies (like pandas, do we really need numpy?)

But simultaneously, with the recent developments around vegafusion in combination with vl-convert, it would be great to have it somehow installed as a hard dependency instead as the current soft dependency approach.

I see four options, but maybe there are more (from a conda perspective)

  1. Do nothing, keep it as it is.
  2. Extend altair with vegafusion, anywidget and vl-convert-python as hard dependencies.
  3. Create an altair-base package which has the current minimal dependencies.
  4. Create an altair-vegafusion package that includes vegafusion, anywidget and vl-convert-python as hard dependencies.

Option 3 enables an opt-out approach. Option 4 enables an opt-in approach.

Topics for consideration are at least:

jonmmease commented 7 months ago

Cross reference https://github.com/altair-viz/altair/issues/2818. I think I would lean toward adding an extra_requires group for the optional dependencies, at least to start with as this would be a non-breaking change.

Eventually, I think the altair-base approach is worth considering. But this would need to correspond to a major version bump IMO.

joelostblom commented 7 months ago

I think it would be nice to use vegafusion and vl-convert more by default in altair, e.g. enabling the vegafusion transformer as the default transformer and offline mode by default for jupytercharts. I don't have a strong preference whether it is via an opt-out or opt-in mechanism, but leaning slightly towards that the default is with all batteries includes / how we believe most people would get the most out of the library.

If we decide to go that path, I wonder what the default behavior should be when these libraries are not installed, should it be that altair shows a warning and asks to install vegafusion or change the renderer to "html". Or should it silently fall back to the html renderer when it can't find vegafusion? The former sounds safer and would likely reduce some confusing edgecases. I agree that a change like that would be in a major version.

binste commented 6 months ago

I agree that it's nice if users can get as much as possible out of Altair by default without (much) configuration. At the same time, altair is mentioned in many requirements files by now and changing the dependencies to include a lot more, even in a major version bump, might ruffle some feathers.

I see it as two linked but somewhat separate discussions:

Simplifying installation

extra_requires groups solve this nicely in my opinion and I'd lean towards this instead of adding more dependencies to altair and creating an altair-base. It's the pip way to go to solve something like this and also taken by other libraries such as pandas. There example also shows nicely that this better scales to more optional dependencies in the future. Maybe we want to include avenger at one point or altair_ally, altair_tiles, ... The dependency groups allow us to do:

Can be freely combined as altair[save, performance].

Enabling the magic of VegaFusion by default if it is installed

We could tackle this somewhat independently of the first topic. What do you think?

Related discussion: .to_dict/.to_json have to return a Vega spec if VegaFusion is used

joelostblom commented 6 months ago

Separating into two issues sounds good :+1: For the packaging part, I'm on board with using extra_requires and keeping altair as the base with altair[all] being all optional dependencies (and that we create a corresponding altair-all conda package).

Would the next step be to decide which package to include in all? I'm thinking maybe vl-convert and everything vegafusion-related to start; maybe also vega_datasets? And then consider adding the additional packages (tiles, ally, avenger)? What do you think?

jonmmease commented 6 months ago

For all, I'd recommend including at least:

vegafusion-jupyter is going to be deprecated and won't be needed anymore with the integration of VegaFusion into JupyterChart.

avenger isn't ready for prime-time yet.

I would defer to @joelostblom and @binste on whether to include altair_tiles and altair_ally, but I don't have any objection. Do either have any dependencies that might be problematic?

joelostblom commented 6 months ago

Good note on anywidget, I agree that should be included and your list looks great to me. I don't have any problematic dependencies in altair_ally at the moment, but I want to do some more work on the library before it is included in an official altair context; I'm planning to make time for it later in the spring.

binste commented 6 months ago

altair_tiles only has 2 additional dependencies (mercantile and xyzservices) which are pure Python packages, both are well-maintained. So I think we could include it in all as well! I've been using it now for a while in production use cases and for single-view maps, it works. If we include it, I can also add a small example for it or some reference to it in the geoshape mark documentation.

I agree with your list for all @jonmmease.

jonmmease commented 6 months ago

Great, let's include altair_tiles then as well! And yeah, it would be great to add some documentation for it directly in the main Altair docs.

joelostblom commented 6 months ago

Related to packaging, I PRed an altair recipe into the default pyodide image in https://github.com/pyodide/pyodide/pull/4580. The pyodide devs were very helpful and seem open to new packages being added, so let them know if there is anything else altair-related that you think should be in Pyodide by default.