support first-class modules in load syntax

fkorotkov commented 4 years ago

I want to bring to a discussion an option to have a wildcard load in order to load all values from a module:

load("module.star", "*")

Our use case is that we are trying to allow user of Cirrus CI to write configuration files in Starlark rather then YAML. In order for the configuration look prettier than just a bunch of dicts we'd like to have a module with syntactic sugar like this:

load("core.star", "task", "container", "script", "always", "artifacts")

def main(ctx):
    return [
        task(
            name="Lint",
            instance=container("golangci/golangci-lint:latest", cpu=1.0, memory=512),
            instructions=[
                script("lint", "echo $STARLARK", "golangci-lint run -v --out-format json > golangci.json"),
                always(
                    artifacts("report", "golangci.json", type="text/json", format="golangci")
                )
            ]
        )
    ]

Having to explicitly specify all symbols loaded from a module might be a bit of a struggle. It will be nice if we could do something like that with the standard Starlark rather then creating our own builtin for that:

load("core.star", "*")

Also Python itself can do wildcard imports so it seems consistent with the idea of Startlark.

alandonovan commented 4 years ago

Wildcard loads, in any language, are a bad idea because they make it hard to know where a name is defined. But there is a cleaner way to solve the same problem: first-class modules. In Python, you can say from os import mkdir, analogous to a Starlark load, or you can say import os and later refer to os.mkdir. Indeed, the second form is the more usual one in Python. In Bazel, many .bzl files construct an explicit struct to act as a module, like so (sticking with the os analogy):

def mkdir_(...): ...
def rmdir_(...): ...

os = struct(
   mkdir=mkdir_,
   rmdir=rmdir_,
)

If Starlark had first-class modules, then a load statement would be able to import the entire module as a value. The syntax would presumably be something like:

load("os.bzl") # binds 'os' to the "os.bzl" module, based on the last path segment sans dotted suffixes
os.mkdir(...)

or

load(alt="os.bzl") # binds alternative name 'alt' to the "os.bzl" module
alt.rmdir(...)

I think this would be a worthwhile language change.

If we're changing the syntax of load, we should also at least consider improving it, by using more Python-like syntax, removing unnecessary parens, and removing quotation marks around identifiers:

from "os.bzl" import mkdir, rmdir as alt
import "os.bzl"
import "os.bzl" as alt

fkorotkov commented 4 years ago

I agree that wildcard imports are generally bad because of the reasoning issues but I don't see wildcard imports as bad for small applications comparing to big projects with hundreds of files.

What do you think about allowing at most one wildcard import in order to simplify reasoning and implementation? Looks like a good compromise to still support Python-like wildcard imports but also don't allow to go out of hand with exponentially growing amount of modules to look into while resolving a symbol.

laurentlb commented 4 years ago

I agree with Alan. With wildcards, it's easy to get conflicts (two files defining the same name). So defining a new function becomes a breaking change for other Starlark modules. It also easily confuses the reader (it's hard to find where something is defined). It adds complexity to the tools and IDEs (e.g. how to jump to definition).

I'd rather not add complexity to the language and tools, just so that it's easier to write code that will eventually be harder to maintain.

alandonovan commented 4 years ago

I don't see wildcard imports as bad for small applications comparing to big projects with hundreds of files.

Every big project with hundreds of files starts as a small application.

What do you think about allowing at most one wildcard import in order to simplify reasoning and implementation?

I still don't like it. You would have to pay all the complexity of the feature but then place an artificial limit on it.

More importantly, wildcard imports are completely incompatible with the Python compilation model. They are feasible to implement in a compiled language like Java or Go because the names defined by the imported module are enumerable at compilation time. By contrast, in a Python-like language, compilation occurs one file at a time, before the module becomes available during execution, so you simply don't know what names would be bound by a wildcard import statement. This means it would be impossible to tell during compilation whether a reference is defined by a wildcard import, predeclared (like 'None'), or undefined.

alandonovan commented 4 years ago

See spec proposal: https://github.com/bazelbuild/starlark/issues/111

google / starlark-go

support first-class modules in load syntax #302