Rethinking Spago's Config format

Since the inception of the upcoming Registry almost two years ago, the idea of "reimagining Spago's config" format has been on the table, because the changes in package format give us the occasion to think from scratch how a config format that takes into account all the lessons learned so far could look like.

This has been discussed in various places before, but so far there hasn't been a place to plan this rehaul, so here we are.

The new format

One of the issues of the current format is that it doesn't have a proper "Dhall type": this is because we include the package set as a Dhall record, and we don't know in advance the type of that, since the keys of the record are the package names and they might change. This has made reading the configuration a somewhat clumsy business, since we can't use the facilities provided by the dhall library for this, error messages are sometimes terrible, and there are issues with accepting different config formats (more on "versioning the config" later).

Another issue that we have right now is that some settings sit only in the config on disk, some can only be supplied through flags, some both. We'd like to make it so that all the config options are overridable with a flag, or viceversa. Without a way to nicely evolve the config schema this work has been unpleasant and deferred.

Yet another issue of the current way we do things is that we have two configuration files (spago.dhall and packages.dhall) - this is a historical artefact of the reason why Spago exists in the first place: to allow users to create custom package sets. Now that we are revisiting the format of package sets themselves we can consolidate the two files in a single configuration, at least for simple projects.

Not an issue, but a missing feature is the ability to just resolve package versions according to bounds, instead of having to use a specific pinned version as it's the case with package sets. We look at how to address this here.

The following is a first draft of how I'd like things to look like, presented in a literate-Dhall format. In this snippet we build up a series of types from the Registry until the Config type. Examples of how actual configs would look like will follow in further sections, this section will only demonstrate the schema.

Without further ado:

-- since we need to eventually talk with the Registry, some things are best modeled using its types
let Registry = https://raw.githubusercontent.com/purescript/registry/master/v1/Registry.dhall
let Map = Registry.Prelude.Map.Type

-- we define here the type of packages that Spago will try to fetch from sources that are not the Registry
let SpagoPackage =
      < Repo : { repo : Registry.Repo, ref : Text }
      | Local : Registry.Prelude.Location.Type
      >

-- we then parametrize the Address type from the Registry, so we're able to extend the package set
-- with the external packages defined above
let Address = Registry.Address SpagoPackage

-- We'll call "Index" of packages the place where we'll look for the packages to use during a build.
-- This could either be:
-- - a package set: that is, a list of package versions that are known to compile together.
--   A set is compatible with a certain compiler version, and is a mapping from package names to `Address`es.
--   Package sets published on the Registry will only contain an `Address` pointing at the Registry, but users 
--   can override packages with local and/or external ones.
-- - the Registry: this means any package version published on the Registry.
--   This source can be extended with a mapping between package names and a list of `Address`es, so that
--   users can import local packages or things from outside the registry in general.
let Index =
      < Registry :  Map Text Address
      | PackageSet : 
          { compiler : Text
          , packages : Map Text Address
          }
      >

-- a compilation target, i.e. "how to build a bunch of files from the project"
let Target = {
      -- A mapping between package names and SemVer ranges for them.
      , dependencies : Map Text Text
      -- Source globs for the local project to include in the compilation alongside its dependencies.
      , sources : List Text
      -- Output folder where the compiler will put its results.
      , output : Text
      -- A target might not be pointing at the JS backend - if that's the case we can specify here the command
      -- for the alternate backend. Example values: `purerl`, `psgo`, etc.
      , backend : Optional Text
      }

-- Finally, the type of a Spago Config:
let Config =
      {
      -- The name of the project
      , name : Text
      -- A description of the project
      , description: Optional Text
      -- The SPDX code for the license under which the code is released
      , license : Text
      -- The current version of this package
      , version : Text
      -- The git repo the package is published at
      , repository : Optional Registry.Repo
      -- The source for the package versions listed in the `targets`
      , packages : Index
      -- Compilation targets for the project
      , targets : Map Text Target
      }

-- we can provide some defaults (through the `::` operator, that we'll see later) so that users don't have to deal
-- with all the settings at once if they have a simple project
let configDefaults =
      { license = "AGPL"
      , description = None Text
      , repository = None Registry.Repo
      , packages = Index.Registry ([] : Prelude.Map.Type Text Address)
      , targets = (toMap {=}) : Map Text Target
      }
let targetDefaults =
      { dependencies = (toMap {=}) : Map Text Text
      , sources = [] : List Text
      , output = "output"
      , backend = None Text
      }

-- let's make some helpers to easily declare overriden packages, we'll use these later
let mkPackageFromGitHub = \(githubOwner : Text) -> \(githubRepo : Text) -> \(ref : Text) ->
      Address.ExternalPkg (SpagoPackage.Repo
        { ref
        , repo = Registry.Repo.GitHub
            { subdir = None Text
            , githubOwner
            , githubRepo
            }
        })
let mkLocalPackage = \(location : Registry.Prelude.Location.Type) ->
      Address.ExternalPkg (SpagoPackage.Local location)

-- we then put together a little "Spago standard library", a set of helpers that we'll use to write config files:
let Spago =
      { Address = Registry.Address SpagoPackage
      , Repo = Registry.Repo
      , Prelude = Registry.Prelude
      , Dependencies = Map Text Text
      , Target = { Type = Target, default = targetDefaults }
      , Config = { Type = Config, default = configDefaults }
      , Index
      , SpagoPackages
      , mkPackageFromGitHub
      , mkLocalPackage
      }

A minimal configuration

A minimal configuration with the above schema would look like this:

-- (note: this is the `Spago` record we defined above)
let Spago = https://raw.githubusercontent.com/purescript/spago/master/stdlib/v1.dhall

-- the `::` operator allows to create a "record with defaults", so that we don't have to specify all the keys,
-- while still conforming to the schema
in  Spago.Config::{
    , name = "mypackage"
    , targets =
        toMap
          { src = Spago.Target::{
            , sources = [ "src/**/*.purs" ]
            , dependencies = toMap { prelude: "^5.0.0" }
            }
          }
    }

Using package sets

You might note that since we're not specifying the source of our packages in the configuration above, the default is applied, which is "the Registry" - that is, Spago is supposed to use a (not yet implemented, and not included in the scope of this yet) solver to look in the Registry for packages that match your bounds.

However, many folks would like to use package sets to not have to worry about bounds and other things. This is how such a configuration would look like:

let Spago = https://raw.githubusercontent.com/purescript/spago/master/stdlib/v1.dhall
let upstream = https://raw.githubusercontent.com/purescript/registry/master/v1/sets/20200418.dhall

in  Spago.Config::{
    , name = "mypackage"
    , packages = 
        Spago.Index.PackageSet
          { compiler = upstream.compiler
          , packages = toMap upstream.packages 
          }
    , targets =
        toMap
          { src = Spago.Target::{
            , sources = [ "src/**/*.purs" ]
            , dependencies = toMap { prelude: "*" }
            }
          }
    }

You may note that in this version of the config we can safely set the version of the dependencies to *, which means that we'll accept whatever version is coming from the package set.

Extending/overriding packages

Thanks to the helpers we defined above, we can easily override packages in the set with local or remote ones:

let Spago = https://raw.githubusercontent.com/purescript/spago/master/stdlib/v1.dhall
let upstream = https://raw.githubusercontent.com/purescript/registry/master/v1/sets/20200418.dhall

let overrides =
      { effect = Spago.mkLocalPackage (../my-effect-fork/ as Location)
      , prelude = Spago.mkPackageFromGitHub "some-author" "some-prelude-fork" "0.0.1"
      }

in  Spago.Config::{
    , name = "mypackage"
    , packages = 
        Spago.Index.PackageSet
          { compiler = upstream.compiler
          , packages = toMap (upstream.packages // overrides)
          }
    , targets =
        toMap
          { src = Spago.Target::{
            , sources = [ "src/**/*.purs" ]
            , dependencies = toMap { prelude: "*", effect: "*" }
            }
          }
    }

Opting in

With such a change in the config format it would be hard for us to try to support both formats in a seamless way, mostly so that we can continue providing good error messages (i.e. what happens when Spago can't match your config to either format? The error message would necessarily be very imprecise).

So I propose that we introduce a command spago migrate, that would allow us to do a one-time migration from the old config to the new config format, and implement a whole separate code path for the new configuration. This is cleaner, easier to maintain, and will allow us to keep our existing error messages for old configs, and add new (good) ones for the new format.

Having such a clean separation also ensures that folks that are willing to try out the new config while it's still an experimental feature can do that while Spago works as before for everyone else.

Versioning the config

The question of "will we have to do all this fuss every time we want to change the schema?" comes pretty natural when discussing this. I don't thing we would. As mentioned in the introduction, our main problem right now is that the config can't be typed with a static Dhall type, and that bars us from using the facilities to easily read things in provided by the dhall library.

Once we can assign a type to the config, then we can just add new versions of the schema, and try to read them in until one succeeds:

Dhall.input configTypeV4 config <|> Dhall.input configTypeV3 config <|> Dhall.input configTypeV2 config ...

Removing `spago install`

Finally, I would like to propose that we just remove the spago install $package command.

This is because supporting it properly is a lot of work already with the current format, and it might be unfeasible with the new config.

Dealing with Spago configs already requires the user to be familiar with Dhall and encourages using its features. Once folks do that (things like splitting their dependencies to different files, etc) then it becomes harder for us to support an install command that "just does the right thing", while for humans it would be easy (even if just slightly more unconvenient) to take care of adding dependencies in the right place.

Where to go from here

This ticket contains a concrete proposal that I'll start to implement as part of the first Registry alpha, but it's also sort of an RFC and I'd like community input on this.

As a consequence of this being a first draft, things are very much in flux. The "Spago standard library" will likely be the aspect most subject to change before this stabilizes.

I have a few questions for anyone willing to read until this point:

in the current draft we specify a "global" source of packages (the Index) - would we like that to be target-specific instead? So that one could have e.g. different targets building with different package sets
a related point is: right now ranges for dependencies are always specified in a target - would we like to skip that when building with package sets? That might look slightly nicer, but might be inconvenient when it comes to publishing a package, because then one would have to fill these in, and it's more work to start from scratch

This is also the best place/moment to propose new keys to be added to the config. So far we have a few proposals in the issue tracker:

output from #455
repl-package from #817

...but feel free to add more proposals below so that we can discuss them in the context of this :slightly_smiling_face:

Alright! I read this quickly on my phone, went for a long walk, then came back and read it again, thoroughly. Here are my thoughts as someone who is using spago today (to a further extent than just listing dependencies from the package set):

The "Spago Prelude" is loaded from a URL (let Spago = https://raw.githubusercontent.com/purescript/spago/master/stdlib/v1.dhall). What does this mean for developing offline? Does dhall/spago cache the file the first time it's downloaded? If so, is it cached per-project, per-machine, or something else?
In the time since spago made the change to error if you're importing modules from packages that aren't explicitly listed as dependencies, I've multiple times found myself in the situation where there are several (3+) dependencies i have to add (whether it be using the new spago version on an existing project, or making a new project and not caring about adding low-laying dependencies like arrays and aff right away). If spago install is removed, adding these dependencies becomes quite a bit more time consuming if it's more than just one or two, especially if each one has to be referenced like dependency_name: "*" (for reference, spago currently prints out a command you can use to automatically install all the missing dependencies). This might be an alright trade-off anyway, but i thought it was at least worth mentioning.

Is there a reason that the published package sets use a record to list the packages, instead of a map? If the package sets (what you'd expect to find at e.g. https://raw.githubusercontent.com/purescript/registry/master/v1/sets/20200418.dhall) were provided as:

{ compiler : Text
, packages : Map Text Address
}

instead of:

{ compiler : Text
, packages : {- record containing all the packages -}
}

You'd be able to write:

-- set without overrides
packages = upstream
-- set with overrides
packages_alt = 
    Spago.Index.PackageSet
      { compiler = upstream.compiler
      -- assuming `//` also works for merging maps
      , packages = upstream.packages // overrides
      }

instead of:

-- set without overrides
packages =
    Spago.Index.PackageSet
      { compiler = upstream.compiler
      , packages = toMap upstream.packages
      }
-- set with overrides
packages_alt = 
    Spago.Index.PackageSet
      { compiler = upstream.compiler
      , packages = toMap (upstream.packages // overrides)
      }

I think it's a bad idea to specify a configuration-global source of packages (Index). One reason being what you mention, that it prevents you from specifying different package sets for different build targets, which limits you if you wanted to make a package that was compatible with both the js backend and purerl, for example. Another problem with it is that it separates the definition of available packages from the place where you decide which of those packages to use, which isn't great when the two types of package sources (registry and package set) are fundamentally different. (The way I've understood it, a package set is a collection of packages, where each package has only one version, and all officially published package sets are known/guaranteed to all build together. While the registry is a collection of packages that may have multiple versions of each package.) The type system can't prevent you from writing prelude: "*" while having the/a registry as your package source, and I can't imagine a scenario where that'd be what you want. Inversely, if your package source is a package set, it makes no sense (or there's at least no real reason i can see) to write anything other than prelude: "*", since there is only one version available anyway. Therefore, I think it's better if the package source and dependencies are both specified in the dependencies field of each target:
```
-- first, the definition of `Target` would look something like this:
let Target = {
  , dependencies :
      < PackageSet :
          { packages :
              { compiler : Text
              , packages : Map Text Address
              }
          , dependencies : List Text
          }
      | Registry :
          { registry : Map Text Address
          , dependencies : Map Text Text
          }
      >
  -- Source globs for the local project to include in the compilation alongside its dependencies.
  , sources : List Text
  -- Output folder where the compiler will put its results.
  , output : Text
  -- A target might not be pointing at the JS backend - if that's the case we can specify here the command
  -- for the alternate backend. Example values: `purerl`, `psgo`, etc.
  , backend : Optional Text
  }
```

-- these definitions below probably aren't completely correct dhall, but hopefully you can see what i'm going for

-- the definition of some target (using a package set) Spago.Target.PackageSet::{ , sources = [ "src/*/.purs" ] , dependencies = -- implying the change described in point 3. is made { packages = upstream -- can't think of a better name for this field on the spot, but either this field or the other one called dependencies (one level up) should probably be called something else , dependencies = [ "prelude", "effect", {- etc -} ] } }

-- the definition of another target (using a registry) Spago.Target.Registry::{ , sources = [ "src/*/.purs" ] , dependencies = { registry = Spago.Index.Registry -- can't think of a better name for this field on the spot, but either this field or the other one called dependencies (one level up) should probably be called something else , dependencies = toMap { prelude = "^5.0.0", {- etc -} } } }



5. I imagine `spago publish` would work per-target? If so, it would make sense to be able to say `spago publish <target_name>`, but what should be the default behavior for `spago publish` when the config file has multiple targets? First thing that comes to mind is to error and make the user choose one. Another possibility would be to have some notion of a default target, perhaps just the first/top defined one, though that only works if dhall maps and records are ordered, and the order is preserved by `toMap` (I'm quite new to more advanced use of dhall, so I don't know if that's the case).

6. `spago upgrade-set` might be a little tricky to get working for any given spago configuration, but i think it would be enough if it looked for `let upstream = <official set url>` in `spago.dhall` and replaced the URL with the one for the latest compatible set.

7. I'm confused about `let Address = Registry.Address SpagoPackage`. From what i can see, `Registry.Address` is already a fully applied type. Is this a mistake, or a dhall feature i don't understand? 😄

I'm confused about let Address = Registry.Address SpagoPackage. From what i can see, Registry.Address is already a fully applied type. Is this a mistake, or a dhall feature i don't understand?

See: https://github.com/purescript/registry/pull/306#pullrequestreview-864090912

Thanks for the thoughtful comments @artemisSystem! This is all great :slightly_smiling_face:

... What does this mean for developing offline? Does dhall/spago cache the file the first time it's downloaded? If so, is it cached per-project, per-machine, or something else?

We'd apply the same logic that we use today for the package sets URLs: the first time we look at this file we check if these remote URLs are frozen (i.e. have a sha256 on them so that they can be cached) and if not then we freeze them. Then Spago wouldn't need to hit the network from the next time it evaluates this file, as frozen imports are cached.

So offline would work, but not the first time. I have been thinking that Spago could ship with a cache of files so that we wouldn't need to hit the network for many of these files, but it's an optimization and it's orthogonal to the new config (it's an issue that exists today and can be fixed today, independently on where we go with this config)

... If spago install is removed, adding these dependencies becomes quite a bit more time consuming if it's more than just one or two, especially if each one has to be referenced like dependency_name: "*" (for reference, spago currently prints out a command you can use to automatically install all the missing dependencies)

I don't have data on this, but I feel that adding manually some dependencies to the config if we make this change would probably be slightly annoying the first times, but then quickly become a fairly mechanical operation. Adding dependencies to a Haskell project config is currently not automated, and adding some when needed is really not a source of friction in my experience. Having to do this actually made me more aware of what's going on with my config files - and this is one of the main ideas behind this change: make folks aware of how to edit their build config. I am generally wary of turning the editing of the config into a CLI-driven experience, because it's hard to do properly (UX-wise and implementation-wise) and it's a moving target, and because in my experience it's easy to lose touch with how to go at editing a build config. An easy objection to this could be "but Fabrizio, you continue to say that people should not be wasting time fighting with their build tools, that they should be invisible so that folks can concentrate on building software". And this is totally right, but unfortunately we don't yet have a magic AI thing that automatically configures your build, so it's great to not have to worry how your project build is configured, until some day you need something slightly fancier, and we support that, but you feel lost about how to go at it because it's the first time you actually look at the config. We are using a powerful configuration format - a proper programming language in fact - so that users can be empowered to achieve things without us having to implement everything backend side. Empowering users to achieve the same things with a simpler configuration format and an outstanding CLI-driven UX is probably possible, but I haven't seen this implemented in practice yet, in any of the tools I used so far.

Is there a reason that the published package sets use a record to list the packages, instead of a map?

Yes, it's because the // (record merge) operator works on Dhall records, but not on Maps, which are in fact just lists. We could use the list append operator # to add dependencies if this was a list, but then semantics would not be clear on the Dhall side, since we'd have multiple instances on the same package, and we'd need to select the right one (depending on the order) once we process the list on the Spago side. I think that would mess up with the semantics.

I proposed a new Dhall builtin to be able to access Map elements in the same way that record elements are accessed, e.g. you could do this to override effect's version

upstream with upstream.packages.^.effect = Spago.Registry "3.0.0"

...even though the packages element would be a Map. The proposal has gathered consensus and it's sorta ready to be standardized and implemented (and possibly have a bounty on it too), but no one got at it yet

I think it's a bad idea to specify a configuration-global source of packages

I share your concerns and I'm torn between the two alternatives, but the reason I'm resistant about having the dependencies be a List Text is because of "library publishing": if you want to publish a library when using package sets as Index you should still be able to specify bounds for consumers of your code, which would not be possible in this case

I imagine spago publish would work per-target? If so, it would make sense to be able to say spago publish <target_name>, but what should be the default behavior for spago publish when the config file has multiple targets?

Yeah, we'd specify a name for the default target and just go for that when not provided by the user, and error out if we can't find that name in the targets list. In case of multiple targets users could publish any of them...which makes me think that we need to support better the monorepo story. E.g. if you were to publish multiple packages from the same config, then one would need things like package_name, repo and subdir for every target. Targets might need a proper redesign in light of monorepo support.

6.

Yeah, that's already how upgrade-set works now, and that wouldn't change

7.

Sorry for that, as Thomas noted that PR was still in flight at the time of writing, but now it went in so it should be clearer :smile:

I don't have data on this, but I feel that adding manually some dependencies to the config if we make this change would probably be slightly annoying the first times, but then quickly become a fairly mechanical operation.

I don't have good data on it either, but the only time I can think of where I've heard folks express annoyance with Spago is when you have to manually add dependencies -- for example, when you've split out multiple spago.dhall files for your test, example, etc. directories. It's especially acute now that transitive dependencies that are imported directly into code must be present in your dependencies explicitly (which is a good thing!), and that unused dependencies emit a warning. This error and warning can lead you to be installing and removing dependencies frequently while working.

I feel like I run spago install and spago build all the time in Spago-based projects, and it would be a shame to lose the install command. I think it's critical for a smooth experience using the package manager. I'm in the camp where I would prefer to lose flexibility over losing spago install, though of course it always comes down to how much flexibility must be lost 😬

I don't have good data on it either, but the only time I can think of where I've heard folks express annoyance with Spago is when you have to manually add dependencies -- for example, when you've split out multiple spago.dhall files for your test, example, etc. directories

These folks would not get any relief though, and adding targets would make everyone fall into this situation. E.g. I suspect that many people will want to share some dependencies between app and test targets, which is already a non-trivial-looking configuration.

It's especially acute now that transitive dependencies that are imported directly into code must be present in your dependencies explicitly

I suspect that everyone got surprised/annoyed/traumatized by this because that was a change that broke everyone's builds on upgrade (if I could plan that again I would not do it in such a breaking way) and required users to install a whole lot of dependencies in all their projects all of a sudden. Personally speaking I rarely add new dependencies to a project in not-exceptional times.

I feel like I run spago install and spago build all the time in Spago-based projects, and it would be a shame to lose the install command. I think it's critical for a smooth experience using the package manager. I'm in the camp where I would prefer to lose flexibility over losing spago install, though of course it always comes down to how much flexibility must be lost

Could it just be that you're used to having spago install available and only use simple configs? I don't think the UX would get much rougher than how it is now - instead of typing a command in the terminal you'd go edit a file in your editor, which is already open anyways.

It could be that I spent too much time in this issue tracker, but really after all of this time of observing usage patterns and features that people ask for I am of the opinion that having spago install available is a bad tradeoff. It's broken for anything that is not a simple config (see #813), and having it there seems to keep up the illusion that one can handle all the build-config related matters from the terminal, which - as I described in my previous comment - is not something that I think we should get into. The simple fact that it's there represents a pull towards other config-editing commands which would be so convenient to have (would they though? One still needs to figure out the syntax for terminal commands), see e.g. #808, #731 and #404.

So I'd say the path of "let's just leave spago install half-broken" doesn't seem to be working, and the situation is not going to improve once we add targets. The two remanining scenarios are (1) we fix it properly (with something like #815, which does what you expect most of the time, does something safe but maybe not expected in some corner cases, and adds a ton of complexity to the codebase), or (2) we give up on all the Dhall-editing endeavors and just let users edit their configs. As a maintainer of this thing I am really in the second camp :slightly_smiling_face:

There is probably space for a more-cargo-like tool (as in: simple config format, do-everything-with-CLI) for PureScript, but personally I wouldn't like to take Spago there.

As the author of #815, I'm definitely in the "we fix it properly" camp :wink:. And being in that camp, could targets be updated to include another target as one of its dependencies? I think that would deal with the corner case I found when I have to update a let binding with a new package.

FWIW, regarding spago install, my experience mirrors @thomashoneyman's. I personally would be disappointed to see it gone. In more complex situations I've setup to encode targets, I've either constructed it in a way that makes it easy for me to continue to use spago install, or I've cursed whenever I can't. Honestly, the warnings about transitive dependencies with the command suggestion make supporting it totally worth it, IMO, and I wish we could support the inverse for the unused package warnings. I just wonder if there's a way to make it's behavior predictable and documented, with a clear failure mode and explanation so that authors at least know when and how they can rely on it (or purposefully abstain)? I may have missed more discussion about that. I would love to have that discussion, but this issue may not be the best place.

I think it's a bad idea to specify a configuration-global source of packages

I share your concerns and I'm torn between the two alternatives, but the reason I'm resistant about having the dependencies be a List Text is because of "library publishing": if you want to publish a library when using package sets as Index you should still be able to specify bounds for consumers of your code, which would not be possible in this case

Coming back to this again. I don't really see a reason a library author would need/want to use a package set index. You're already specifying version bounds for your dependencies, so you don't really need to lock them down to a specific version on top of that (the version in the package set), which as i understand it, are only used for compilation anyway. As a library author myself, i would be perfectly fine with just using the registry as my package index.

I think the benefits of specifying a package index per-target and not per-config, meaning that we can make package set dependencies a List Text instead of a Map Text Text are pretty big. It would make most configs (those that use package sets) look less cluttered, and make it easier/less annoying to add dependencies manually. The benefits vs tradeoffs we get from keeping indexes config-global don't seem as attractive.

I don't have the energy for it today, but hopefully soon i can write up the pros and cons i can think of for each approach so it's easier to visualize.

I now have a first implementation of this in a branch that I'll push asap.

By how the config ended up looking, I have the feeling that making spago install work on 100% of Dhall configurations is something that is not possible at all, because the semantics of where the user means to install things are not always entirely clear.

So I see two disjunct possibilities:

we keep a Dhall configuration format and let go of spago install
we switch to something like TOML, and keep spago install and probably add some more convenience commands

At this point I don't have a personal preference on this myself, so I delegate this choice to the community. We should probably run a poll on Discourse about this.

I'd really prefer to keep Dhall and still make spago install work. So, finding some way to accomplish that goal would be a priority for me.

Here's another idea discussed briefly in the Discord call. Below is a more fleshed out idea for how it could work.

Summary

Rather than needing to edit the spago.dhall's dependencies field either directly by hand or indicrectly via spago install somePackageName, what if Spago could automatically determine what dependencies were needed based on the module(s) being imported into a file? In other words, what if spago install (or editing configuration files) wasn't required in the first place?

For example, if I write the following source code:

module Main where

import Prelude

import Effect (Effect)
import Effect.Console (log)

main :: Effect Unit
main = log "hello world."

Spago would automatically determine that Main needs prelude, effect, and console to build.

Implementation

First some context:

Currently, the compiler (being that it's not currently package-aware) requires all modules that it compiles in a given run to be unique. So, duplicate module names will produce an error.
The package set is currently verified via CI. On top of verifying it, we could add additional metadata to the package set when making a release.

Second, let's introduce the concept of an module-package index.

A module-package file is a list of mappings of 1 module name to 1 package name. If this was saved as a text file, it might look something like this:

Data.Show -> prelude
Prelude -> prelude
Effect -> effect
Effect.Console -> console

If it was encoded in dhall, it might exist as a Map Text Text expression.

Regardless of how the information is encoded, when CI verifies the package set, it also produces a module-package index for that package set.

Third, Spago uses the module-package index to determine which dependencies are needed.

Spago could determine what modules a given .purs file imports by one of the following:

grepping a file for import statements (brittle)
using a new API provided by purs that returns a list of imported modules for a given .purs file (syntax-aware example)

Knowing what a given file's modules are, spago would use the module-package index to determine which dependencies need to be installed (either by downloading them from the Internet or copying them over from the spago cache), skipping those that are already installed.

Fourth, the present design as it has been described so far creates a problem that will be resolved in the next point: how does spago build work if local code defined in the test directory imports local code in the src directory? In other words, modules defined in the src directory will not appear in the module-package index. Thus, when spago comes across src-dir modules imported in test-dir code, it will not find a corresponding package and fail.

-- src/Main.purs
module Main where

foo :: Int
foo = 1

-- test/Main.purs
module Test.Main where

import Prelude

import Effect (Effect)
import Effect.Console (log)
import Main (foo)

main = log $ show foo
-- Spago: in `test/Main.purs` for module `Test.Main`,
-- could not find a corresponding package for module `Main`

Fifth and to fix the first part of point four, one must build a derivation of the base package set by adding project code to the package set as its own package(s). This derivation is called the local package set.

The base package set refers to one of the released package sets from the purescript/package-set repo. For example, https://github.com/purescript/package-sets/releases/download/psc-0.14.7-20220315/packages.dhall is the base package set.

The local package set refers to the packages.dhall file Spago currently (i.e. as Spago presently works, not as it could work in this proposal) uses, which currently looks like this:

let upstream =
      https://github.com/purescript/package-sets/releases/download/psc-0.14.7-20220315/packages.dhall
        sha256:708ca6d205ab5ecc1a3522e4bbe44fa576a0ebebbbc094a3aa9417be7b246974

in  upstream

By 'build a derivation of the base package set', I mean what we currently do with the packages.dhall file today when a package isn't in the base package set. For example, since language-cst-parser isn't in the package set, one must add it to the package set before spago install language-cst-parser would work by editing packages.dhall to be the following:

let upstream =
      https://github.com/purescript/package-sets/releases/download/psc-0.14.7-20220315/packages.dhall
        sha256:708ca6d205ab5ecc1a3522e4bbe44fa576a0ebebbbc094a3aa9417be7b246974

in  upstream
    with language-cst-parser =
      { dependencies =
        [ "arrays"
        , "console"
        , "..."
        , "unsafe-coerce"
        ]
      , repo = "https://github.com/natefaubion/purescript-language-cst-parser.git"
      , version = "v0.10.0"
      }

In this proposal, one would also add their project code to the package set as its own package(s). For example, the code in the src directory would be added as a local package projectName-src. Similarly, the test directory would be added as a local package projectName-test. To ensure we don't need to specify dependencies for project local code, which is what this proposal is trying to avoid, we instead support a type similar to Location called LocationGlobs.

let upstream =
      https://github.com/purescript/package-sets/releases/download/psc-0.14.7-20220315/packages.dhall
        sha256:708ca6d205ab5ecc1a3522e4bbe44fa576a0ebebbbc094a3aa9417be7b246974

in  upstream
    with projectName-src = "./src/**/*.purs" as LocationGlobs
    with projectName-test = "./test/**/*.purs" as LocationGlobs

This approach continues to scale even with monorepos, so long as all module names in the package set are unique.

let upstream =
      https://github.com/purescript/package-sets/releases/download/psc-0.14.7-20220315/packages.dhall
        sha256:708ca6d205ab5ecc1a3522e4bbe44fa576a0ebebbbc094a3aa9417be7b246974

in  upstream
    with shared-src = "./shared/src/**/*.purs" as LocationGlobs
    with shared-test = "./shared/test/**/*.purs" as LocationGlobs
    with backend-src = "./backend/src/**/*.purs" as LocationGlobs
    with backend-test = "./backend/test/**/*.purs" as LocationGlobs
    with front-end-src = "./front-end/src/**/*.purs" as LocationGlobs
    with front-end-test = "./front-end/test/**/*.purs" as LocationGlobs
    with scripts-codegen-api = "./scripts/codegen-api/test/**/*.purs" as LocationGlobs
    with scripts-codegen-support = "./scripts/codegen-support/test/**/*.purs" as LocationGlobs

Moreover, updating the packages.dhall file uses less advanced Dhall expressions. Notice how there are no let bindings in the above content aside from the one importing the base package set itself (and that let binding could be removed altogether).

Sixth and fixing the second part of part four, we introduce the concept of a base module-package index and a local module-package index.

The base module-package index is the module-package index corresponding to the base package set. The local module-package index is the module-package index corresponding to the local package set. In other words, the local module-package index is the base module-package index, except that modules from packages added in the local package set take precedence over modules defined in the base package set. Why the local index has precedence over the base index is described in the next point.

Seventh and introducing a new problem, what should happen when project local code defines a module name that clashes with a module name corresponding to a package defined in the base package set (an issue raised by @colinwahl in a discussion after the call finished)?

For example, Data.Set is defined in the ordered-collections package, which is included in the base package set. In the local project, src/Data/Set.purs defines a module called Data.Set. When we run spago build, Data.Set can refer to either ordered-collections' corresponding module or to the local project's module. So, which one is used? Since the local module-package index takes precendence over the base module-package index, then src/Data/Set.purs is used rather than ordered-collections' corresponding module.

In other words, this is analogous to why the following script still runs.

cd project
spago init
spago install
sed -i 's/module Main where/module Data.Set where/' src/Main.purs
# Notice how `ordered-collections` is never installed...
spago run

Despite src/Main.purs' module name, Data.Set, clashing with the module name, Data.Set, defined in the package, ordered-collections, the code still runs because ordered-collections is never installed as a dependency, even if though it's in the package set.

Concerns

Issues with offline usage

Below is a possible workflow with current spago (i.e. not how Spago works in this proposal):

User creates a project via spago init
User calls spago install many package names while having access to Internet or having such dependencies stored locally.
User knows that they will not have access to the Internet in the future, so user spago install other packages in case they are needed during offline compilation and usage.
User keeps coding without having access to Internet

This proposal may affect offline usages:

User creates a project via spago init
User starts writing code while having access to Internet or having such dependencies stored locally. So spago installs those packages as needed.
User knows that they will not have access to the Internet in the future, so user spago install other packages in case they are needed during offline compilation and usage.
- Crucially, spago install packageName should still download and cache the dependency in the spago cache. However, packageName won't be used when building the project until one of its modules is imported.
User keeps coding without having access to Internet

Tradeoffs

IDE auto-import handling

Currently, the IDE works by importing modules based on what dependencies are installed by spago. In other words, the typical workflow is:

spago install somePackage
spago build / IDE rebuild
Start typing some identifier (e.g. foldl), a list of possible identifiers is shown, and the selected identifier's corresponding module is auto-imported.

This proposal flips this on its head.

Start typing some identifier (e.g. foldl), a list of possible identifiers is shown, and the selected identifier's corresponding module is auto-imported.
spago build /` IDE rebuild will install the corresponding dependency

Thus, the module-package index would also need to store all exports, so that auto-import continues to work. This would define a separate mapping, something like an identifier-to-module-name index.

Since anything from the package set could be imported, the identifier list will be very large. As such, identifiers from local project packages should be prioritized over identifiers from packages from the base package set.

Make the compiler package aware

The ideas presented so far may be better accomplished by just making the compiler package aware, such that the following syntax "just works":

module Main where

import "packageName@v0.14.2" ModuleName (Foo)
import "project-local#../src/**/*.purs" ModuleName2 (bar)

@JordanMartinez - whether that proposal works as envisioned or is incorporated in the design or not, i love the framing of it.

Specifically, i think a framing of the process in a user-centric and (aspirationally) zero-config way, which is how i read your proposal, is very valuable.

Thanks everyone for the thoughtful discussion here! It's been of great inspiration to understand the problem space a little better, and I'm grateful for that 😊

A lot of things have happened since then on many fronts - the main ones are that the Registry is now in alpha, and Spago is being rewritten in PureScript. The reasons for the rewrite are many, but the main ones are: (1) dogfooding our own build tool, (2) reuse a lot of the code that we wrote for the Registry, and (3) try to avoid some of the issues that Haskell has, both language-wise and runtime-wise.

The prototype is already merged in this repo, and we'll likely call it an alpha release as soon as we have a few more pieces working. On the "config file" side of things, the configuration format is now YAML. Since we can now build spago with spago, you can find Spago's own config here. I'm sorry to disappoint folks that were hoping to keep Dhall (and this includes myself), but YAML strikes a better tradeoff between being "easy to edit" (which was hard in Dhall), "being featureful" (not as barebones as JSON, but also not too complex), and "community support" (everything is worse than JSON here, but it's better than Dhall and other languages like TOML).

This said, I think the purpose of this ticket might end here - the new config went through quite a few iterations and sort of settled in the current state where I'm reasonably happy with it. So I'll close this issue now, and once we have a release for an alpha of the new Spago I'll welcome the new feedback/input/comments about the config format in their own dedicated issues.

purescript / spago