Ditch Starlark for Python?

indygreg commented 3 years ago

PyOxidizer's configuration files were originally TOML. Then we switched to Starlark so we could leverage a real programming language to express complex configuration. Starlark was chosen over alternatives like Lua because it is purposefully designed to be a configuration language and it has great sandboxing properties.

Maintaining the Starlark code in PyOxidizer has historically been a pain. The crate was effectively dormant for a while. Then Facebook took it over. While they have done great work with the crate, it doesn't support building without Rust Nightly features (although they've offered to fix that once my port past version 0.3 of the crate is in a good position).

As I look at the sheer amount of effort that it will take to port off version 0.3 of the Starlark crate, I question whether the effort would be better spent porting to actual Python instead. Either embed a CPython interpreter in pyoxidizer or a Rust implementation of Python like https://github.com/RustPython/RustPython. The latter would be preferred, as CPython's embedding/sandboxing story isn't great and we'd effectively open up PyOxidizer configuration files to a fully-featured scripting environment. I kind of like keeping behavior more tightly constrained, even if that does mean we end up reinventing a few wheels for things like file I/O.

A benefit to using actual Python is that the configuration files will be more friendly to Python developers. We can do things like a ship a .pyi file defining the typing interface so autocomplete works in IDEs and configuration files can be type validated. This also opens us up to a Python API for PyOxidizer, enabling people to create their own tools leveraging PyOxidizer's internals for building binaries.

If anyone has any thoughts on embedding Python interpreters in Rust, feel free to leave comments here. I'm most curious about sandboxing. CPython can't sandbox Python code that well. Do Python implementations in Rust (like RustPython) enable you to do things like limit which builtins are exposed and prevent explicit file I/O from the interpreter?

SimonBiggs commented 3 years ago

We can do things like a ship a .pyi file defining the typing interface so autocomplete works in IDEs and configuration files can be type validated.

This here is a huge plus, would seriously smooth out the learning curve if the IDE is providing auto-completions.

indygreg commented 3 years ago

Since PyOxidizer can be installed via wheels, we could probably drop a .pyi in there easily enough today. The trick is convincing IDEs to pick it up. I believe most IDEs have their type integration work by sniffing for import statements. Since Starlark doesn't have import, I'm unsure how to convince the IDE to think it is working on a file with in-scope symbols.

If anyone knows of any magic syntax or tricks that could make this work, I'm all ears.

The only thing I can think of is to have PyOxidizer automatically remove an import pyoxidizer (or similar) line from the Starlark configuration file. We also might need to rename the config files to have a known Python extension. If we did these two things and included a pyoxidizer.pyi (and maybe a pyoxidizer.py) in the wheel, I think this might just work.

This is all an orthogonal conversation. But given its importance to lessening the learning curve, I think it is very important orthogonal conversation!

indygreg commented 3 years ago

Another thought on my mind is the value in decoupling the configuration language from Python proper. The target audience of PyOxidizer is Python developers. So Python as the configuration language is highly defensible and arguably the best choice!

However, I've been slowly populating Starlark with non-Python packaging primitives, such as the ability to make macOS application bundles, construct universal/fat mach-o binaries, and Windows MSI installers using WiX. I've been careful to define these primitives independent of PyOxidizer with the ultimate intent of shipping a standalone tool for performing generic application packaging functionality. The target audience of this tool is not just Python developers. For this tool/functionality, Starlark is arguably a better choice because it isn't Python.

While I'm here, I noticed that RustPython hasn't released in over 1 year. That doesn't exactly instill confidence. (But maybe they just don't currently have a strong incentive to release to crates.io since not many projects rely on them as a crate dependency.) Since I got burned by the starlark crate's dormancy (before Facebook took it over), I'm somewhat sensitive about the risk of betting the configuration solution on a less-popular/stable crate.

SimonBiggs commented 3 years ago

If anyone knows of any magic syntax or tricks that could make this work, I'm all ears.

I suspect this might not work in skylark (or if it does it'd be a bit weird...), but here is some hackery that I underwent to make both pylint and pylance happily detect and traverse modules that I was opting to lazily optionally import (and potentially weren't even installed on the user's system):

https://github.com/pymedphys/pymedphys/blob/164a7a5c6051ab4c8fd6efdb79c3bfb0684b65df/lib/pymedphys/_imports/__init__.py#L16-L20

sluongng commented 3 years ago

I am not yet a PyOxidizer user, but I have been keeping an eyes on this project and it's impact onto the Mercurial ecosystem and have been wanting to integrate this into https://github.com/bazelbuild/rules_python. So please take my feedbacks as a grain of salt:

I think the use cases of Starlark for Build system configuration has always been oriented around Keeping It Simple and Stupid (KISS). Exposing too much for end users, especially in a big organization, will guarantee the config to go under Hyrum's Law where unintended use cases are critically depended upon, which cause more frictions overtime.

These unintended use cases could be originated from either the tool's paved road experience was not matured enough for at the time, leading to users having to apply hacky workaround. Or it could be originating from the tools being so complex that it's hard to onboard new users and show them how to 'hold it correctly'.

Regardless, by shipping only a limited set of features of Python inside Starlark, you reduce the boundary of what the tool could become given exposure to an arbitrary big org over an arbitrary long period of time. And therefore, keeping things simple and maintainable overtime.

If the main benefit here is IDE integration, stronger typing with type-annotation support, then perhaps works can be done in those areas to improve Starlark where it is at right now.

Other configuration languages like HCL or Cue-lang could also be considered as they have good IDE support... though I don't think they are a good fit toward Python crowd.

SimonBiggs commented 3 years ago

and have been wanting to integrate this into https://github.com/bazelbuild/rules_python

It would be pretty amazing if PyOxidizer was integrated into a rule usable by Bazel.

And therefore, keeping things simple and maintainable overtime.

:+1:

href commented 3 years ago

I personally would prefer Python over Starlark, as I like to work with VARS anyway and prefer to write my own wrapper.

That is, if I were to use PyOxidizer for everything - which is something I'd love to do in the future - then I might write my own Python package that depends on PyOxidizer and provides my own way of building my projects.

In a way I see the use of Starlark by PyOxidizer as one way to drive an underlying API that should be available to everyone. That underlying API should be something I can use myself and drive differently if I please.

SimonBiggs commented 3 years ago

Exposing too much for end users, especially in a big organization, will guarantee the config to go under Hyrum's Law where unintended use cases are critically depended upon, which cause more frictions overtime.

I have been giving this some more thought. I guess one of the trade offs that is often taken in Python land is that "we are all responsible users":

https://docs.python-guide.org/writing/style/#we-are-all-responsible-users

By using Python as the config yes there are "private flagged items" that are not supported by pyoxidizer, and yes, Hyrum's Law will likely come into play. But the important thing is, if a user chooses to use a non-supported API that is their responsibility.

By giving users the full power of Python, then pyoxidizer is giving users that power to use responsibly. I would argue that "irresponsible use" of the API does not need to be guarded against at the cost of blocking the empowering of the open source community to build things that the creator(s) of pyoxidizer could never have imagined.

If an organisation wants to restrict the configuration of a tool, it might be better if they create their own restricted wrapper, instead of the other way round. Have the tool come off the shelf as flexible as it can be, then if an organisation finds that flexibility an issue that organisation can protect against that either with their own enforced linter or a wrapper config format.

An important note, having Python as the config language would not stop it being able to be used within bazel.

dae commented 2 years ago

I suspect this might be a contentious question, but for the sake of discussion: if a fully-fledged language is on the table, would it be crazy to consider moving the configuration to a Rust file instead?

I can imagine some immediate objections:

Python users may struggle with the syntax

The majority of the default pyoxidizer.bzl is simple boolean, string and list assignments, and with a suitably-updated example config file (including a resource callback for example), it would hopefully not be too difficult even for those without previous Rust experience.

Hyrum's Law

The current Starlark approach mainly amounts to building up a config object and passing it to PyOxidizer for the magic to happen. If such an approach is retained, I'm not sure the choice of language matters a great deal in this regard - whether it be TOML, Python or something else, users are limited to what the DSL can represent.

A Rust toolchain would be required

PyOxidizer is already fetching one as part of the build. Proper IDE support would probably require the user to set up a dev environment however, unless PyOxidizer exposed the downloaded toolchain somewhere, and provided instructions for how to connect it to common IDEs.

Rust build times

As Rust is invoked as part of the build process anyway, I suspect the difference would be fairly negligible.

As for potential upsides:

The Starlark/Rust bridging code looks like a maintenance burden. Every time functionality is added to the Rust backend, it needs to be exposed in Starlark as well. Exposing things with PyO3 would presumably be a bit easier, but bridging between languages is still extra work.
That extends to documentation as well - currently there are internal comments in the Rust code, and then user-facing comments in .rst files for the Starlark, with a fair amount of overlap. If the config were in Rust, the comments could potentially be shared, and accessible both in online documentation and in IDEs. (Kudos to Gregory for the existing docs though - they are very comprehensive!)
It could potentially make program flow a bit easier to follow - currently we have PyOxidizer invoking cargo, and build.rs scripts invoking PyOxidizer, and combined with the bridging layer, it took me some time to get a feel for how things fit together.
It could potentially offer a smoother transition to more advanced workflows, such as extending the default build.rs/main.rs files, adding extra resources, etc.

Is the above enough to justify the potential inconveniences for Python users? I don't know. If there's a desire to provide PyO3 wrappers for the Rust code for the sake of enabling Python-based tools, then the bridging work ends up needing to be done anyway, somewhat reducing the advantages of doing the config in Rust. Still, I thought it might be at least worth considering.

SimonBiggs commented 2 years ago

Here's a plus one user story for the Python config approach:

https://github.com/pymedphys/pymedphys/pull/1557#discussion_r739590126

dae commented 2 years ago

That is already possible - things like version numbers can be passed in on the pyoxidizer command line with --var, then referenced with VARS.

indygreg / PyOxidizer

Ditch Starlark for Python? #444