uwdata / mosaic

An extensible framework for linking databases and interactive views.
https://idl.uw.edu/mosaic
Other
800 stars 49 forks source link

Python API #395

Open domoritz opened 4 months ago

domoritz commented 4 months ago

Design and develop a Python API for generating Mosaic specifications. Similar in spirit to Vega-Altair, the Python API should enable programmatic construction of a Mosaic JSON specification and enable easy operation with data from Pandas, Polars, DuckDB, etc.

domoritz commented 4 months ago

Since https://github.com/uwdata/mosaic/pull/358, Mosaic has a JSON schema (e.g. https://github.com/uwdata/mosaic/blob/main/docs/public/schema/v0.10.0.json) for the spec format so we should be able to use https://github.com/vega/altair/tree/main/tools/schemapi to generate the Python API. The idea is to build something similar to https://github.com/vega/altair/blob/main/tools/generate_schema_wrapper.py but Mosaic specific. The mosaic schema is pretty large so I expect there to be some things we need to change in schemapi but what exactly remains to be seen.

aeroaks commented 1 month ago

Hi, Really amazed by the performance of Mosaic. I can't wait to use it in my daily work to visualise those heavy datasets.

I looked in the documentation and github and it looks like that currently the best way to use Mosaic from Python is the Jupyter widget path. Is that correct?

domoritz commented 1 month ago

Yes. I'm planning to work on a Python API for vgplot this semester.

aeroaks commented 1 month ago

Great, Count me in if you need testing support. :)

domoritz commented 1 day ago

https://pypi.org/project/gosling/ by @manzt looks neat as well and can be an Inspiration. I like how he used Altair as a submodule to get schemapi (which probably should have another release). https://github.com/gosling-lang/gos/tree/main/tools

manzt commented 1 day ago

Thanks for the ping.

If I were starting again, I'd seriously consider using msgspec from @jcrist to build the API layer upon. The separation of encoding/decoding from the base classes is really desirable and makes it so the API layer can be cleaner IMO. Both pydantic and schemapi do not separate the two.

However, in this case I assume you'd like to generate the Python API from some JSON schema. msgspec has the opposite (https://jcristharif.com/msgspec/jsonschema.html), but I wonder with the relatively simpler mosaic JSON schema it would be easier to generate Python API from the JSON schema.

msgpec also supports other encoders, so you could reuse the base classes to generate yaml as well.

manzt commented 1 day ago

Full disclosure, I've been wanting to move toward msgspec for having widget developers define widgets as well. (Will be some time from now to really experiment with those changes as I've been busy wrapping up the PhD).

domoritz commented 1 day ago

Thanks for the pointer! We should explore that.

but I wonder with the relatively simpler mosaic JSON schema it would be easier to generate Python API from the JSON schema

What makes mosaic schema simpler? I would like to avoid having to maintain a Python and a TypeScript version of the spec schema. If we can have it in one place and convert to either, that would be best.

manzt commented 1 day ago

What makes mosaic schema simpler?

Sorry, I haven't taken a close look at the spec package... but the thing that makes spec generation "hard" IMO in Altair/Gosling is that TS source makes use of sophisticated TypeScript features like generics and extending base types.

Having a high-level look at mosiac the types seem to have less inheritance and don't use these features. I believe that would be easier to map to JSON (and simple language bindings). Could be wrong!

That said, since the TS types aren't really used in mosaic core code base (and it appears mostly for generating the JSON spec), maybe it would make sense to author the types in Python for end users on the Python side, and then generate the JSON specs that way.

Totally agree that maintaining both Python and TypeScript types is a huge pain.

domoritz commented 1 day ago

I don't care too much what the source of truth is but I'd like ts and python types. The ts types are being used.

manzt commented 1 day ago

The ts types are being used.

Ah sorry, I wasn't aware. I'd need to explore more to find whats out there. Ideally you could generate typed Python dataclass-like things (e.g., msgspec.Struct) from something like TS or the JSON schema.

Then the API could be a set of functions that operate on those Python types.