Migrate from the .cabal format to a widely supported format

Profpatsch commented 3 years ago

In the wake of the exact-printer initiative, I proposed another approach: why not say good bye to the .cabal file format and switch to something that is widely supported.

There is a few alternatives, the most important attribute would be that they are widely supported in industry.

JSON
YAML
TOML

Note that all of these are (mostly) isomorphic to JSON (scalars, lists, dicts), which is important for easy translation between them (e.g. for config generation purposes).

What would this give the Haskell ecosystem?

Editor support: every modern editor like vscode has a way of assigning JSON schema to a file, which gives completion and inline documentation for free everywhere
Also, syntax highlighting and auto formatting come for free
Cabal doesn’t need to implement its own parser. If JSON is chosen, the parser is even context-free.

What would it give to users?

Instant familiarity with the format: you don’t touch cabal files all too often as a user, so you don’t want to learn yet another syntax
Templating cabal config with standard tooling (e.g. jq, yj), which is important e.g. in a monorepo context
Inline documentation without setup

What are others doing?

Most modern package managers that don’t go the full turing-complete configuration route (e.g. Scala’s sbt, Erlang) usually converge their config on a widely supported syntax.

Examples:

npm, yarn (package.json, package-lock.json)
Stack (stack.yaml, stack.yaml.lock)
hpack (project.yaml)
cargo (Cargo.toml)
Elm (elm-package.json)
Maven (pom.xml)
poetry (pyproject.toml, poetry.lock)

Counterexamples:

go (go.mod), though flat shasums and go packages have no configuration file
pip (requirements.txt), though see poetry above
sbt, hex: both use their turing complete parent languages
leiningen (project.clj), clojure is a lisp, and sexps are already a data format

I don’t expect cabal would drop support for the cabal file format very soon, rather it would start out by generating a .cabal file from the .json/.toml/.yaml for consumption by older version of cabal. Then after a multi-year grace period, the new format would become the standard and projects could drop their autogenerated .cabal files.

Profpatsch commented 3 years ago

Note that some people have mentioned dhall as a possible alternative, but using it would destroy most benefits, namely:

Familiarity
Editor support
widely available tooling for ops integration
simple parser
isomorphic to json

Profpatsch commented 3 years ago

However, I expect that there would be a dhall library for generating cabal.json files, which can aid into integrating dhall-based (dev)ops setups with Haskell packages.

philderbeast commented 3 years ago

Is this cabal.json the plan.json from the cabal docs or is it a .cabal in JSON format?

plan.json (JSON) A JSON serialization of the computed install plan intended for integrating cabal with external tooling. The cabal-plan package provides a library for parsing plan.json files into a Haskell data structure as well as an example tool showing possible applications.

ocharles commented 3 years ago

Note that some people have mentioned dhall as a possible alternative

Also note that this already exists as dhall-to-cabal. While its goal was to generate .cabal files, that's not the only solution. A more integrated solution would be a cabal-install that can actually consume these files. I'm not saying this is the solution, just mentioning this as prior art. I'll step out of the conversation for now and let others share there thoughts, but if any one wants to talk about Dhall in particular here, I do have thoughts

Profpatsch commented 3 years ago

Is this cabal.json the plan.json from the cabal docs or is it a .cabal in JSON format?

It is the current .cabal file in a not-home-grown syntax

emilypi commented 3 years ago

I proposed another approach: why not say good bye to the .cabal file format and switch to something that is widely supported.

When I think about it, don't think these are mutually exclusive tickets: an exact printer gets us a reasonable source representation. This is good for a few reasons: we can derive translational tools from the same representation - we only need change the parser and the printer. This frees up efforts to migrate between formats!

There is a few alternatives, the most important attribute would be that they are widely supported in industry.

Of the three suggestions, TOML is the most attractive. YAML is has too much variable syntax, and JSON is aesthetically (and mechanically) displeasing for me to write as a human. TOML's grammar is minimal and admits a small and easy to generate + verify parser and lexer (note: toml-parser is a little outdated) that eliminates the need for us to write it from scratch. In fact, maintaining this would be a dream, and would be a boon for tooling, since we can derive an ABNF for our specific flavor quite easily.

Then after a multi-year grace period, the new format would become the standard and projects could drop their autogenerated .cabal files.

👍

TikhonJelvis commented 3 years ago

As an experiment, I took a library I wrote a few years ago and manually converted its Cabal file to TOML. I like it! The conversion can be totally systematic.

TOML was noticeably nicer to edit—Emacs has a simple built-in TOML mode and I didn't have to worry about indentation/formatting. (I've done Haskell for over a decade now and I'm still not consistent in how I format Cabal files!) Structured commands for navigating and editing the TOML file would be nice; I don't know if something like this already exists, but if it doesn't, adding it to Emacs would be easy. I wouldn't even think of trying something like that for Cabal's custom syntax.

I've used YAML a lot more than TOML in the past. Compared to YAML, I found needing to quote all my strings a bit annoying; on the other hand, TOML was much nicer to pick up and doesn't have weird corner cases to worry about. At work I recently ran into some weird YAML files that used anchors in a way that didn't work in Python—not something that would happen with TOML.

In my dream world we would use an S-expression based syntax (like sexplib) but I know that is not to be :(.

I immediately found that multiline strings were useful. Multiline strings and comments seems like the bare minimum for a human-oriented format; YAML and TOML support that, JSON doesn't.

It's a bit long, but here's the whole file:

cabal-version = "2.2"

[package]
name = "modular-arithmetic"
version = "2.0.0.1"
synopsis = "A type for integers modulo some constant."
description = """
A convenient type for working with integers module some constant. It saves you from manually wrapping numeric operations all over the place and prevents a range of simple mistakes. @Integer `Mod` 7@ is the type of integers (mod 7) backed by @Integer@.

We also have some cute syntax for these types like @ℤ/7@ for integers modulo 7.
"""
homepage = "https://github.com/TikhonJelvis/modular-arithmetic"
bug-reports = "https://github.com/TikhonJelvis/modular-arithmetic/issues"
license = "BSD-3-Clause"
license-file = "LICENSE"
author = "Tikhon Jelvis <tikhon@jelv.is>"
maintainer = "Tikhon Jelvis <tikhon@jelv.is>"
category = "Math"
build-type = "Simple"
extra-source-files = ["README.md", "CHANGELOG.md"]

[source-repository.head]
type = "git"
location = "git://github.com/TikhonJelvis/modular-arithmetic.git"

[library]
hs-source-dirs = ["src"]
ghc-options = ["-Wall"]
default-language = "Haskell2010"
exposed-modules = [
  "Data.Modular"
]
build-depends = [
  "base >4.9 && <5",
  "typelits-witnesses <0.5"
]

[test-suite.examples]
hs-source-dirs = ["test-suite", "src"]
main-is = "DocTest.hs"
default-language = "Haskell2020"
type = "exitcode-stdio-1.0"
build-depends = [
  "base >4.9 && <5",
  "doctest >= 0.9",
  "typelits-witnesses <0.5"
]

TikhonJelvis commented 3 years ago

Another benefit: the format would be naturally extensible. Cabal could provide a section for plugin/tool/etc config, and tools would have no issues parsing values from there. I'm imagining something like this:

[plugin.liquid-haskell]
smt-solver = "z3mem"

My experience has been that providing "extension points" in formats is always useful. We can't figure out everything people want to do with their libraries ahead of time but we can make the format adaptable. If people need something Cabal doesn't support, they can add it while still keeping a single canonical file for library-specific settings.

gbaz commented 3 years ago

For yaml there's also of course hpack. So anyone who wants to write cabal files in either yaml or dhall is welcome to do so. Note that we don't have exactprinters for either of those formats either, as far as I know. As I recall, due to the semantics of yaml, conditional clauses are rather unpleasant there, among a few other issues (and pretty-printing reorders things in unpleasant ways as well). (And also as emily notes, the yaml grammar is rather complicated as is).

Toml does seem promising, but I worry that its support for conditionals or other more complex syntax wouldn't be particularly great either. Translations of some more complex files might be worthwhile, to experiment with this.

Btw, note that cabal is already extensible, via "x-" fields.

In any case, I think the right next step is to get the cabal grammar pinned down and to have an exactprinter for at least the format we already have and is widespread.

philderbeast commented 3 years ago

This is good for a few reasons: we can derive translational tools from the same representation - we only need change the parser and the printer. This frees up efforts to migrate between formats!

I like this and am the maintainer of the translational tool hpack-dhall that can translate:

dhall -> cabal
dhall -> json
dhall -> yaml # the package.yaml format of hpack
dhall -> dhall # with imports resolved

I am a bit wary about each format being capable of doing a faithful representation. For instance, hpack's conditionals can break dhall's typing. This is the trouble @gbaz just mentioned.

Ericson2314 commented 3 years ago

I want this to work, but unfortunately I see some issues with all the proposed formats so far. I think TOML / YAML / JSON will never work, but Dhall, while it might not be a good fit today, can be made to work.

TOML / YAML / JSON

If Cabal files were merely data they would be fine, but unfortunately they are code, due to the conditionals, and parameters used in those conditions. This is true of Cargo packages too, and the solution there has been to stuff syntax into strings. Firstly, this largely defeats the point as we still need application-specific parsers (and pretty printers!) to handle those strings.

But more worryingly, I have reason to believe this has warped the design process of Cargo. See, for example, the back and for with @djc and me in https://github.com/rust-lang/rfcs/pull/3143, where @djc agrees Cargo has backed itself into a corner, but objects to my further using strings, or trying to encode the information in a more structured but awkward and verbose way. I may have disagreed with @djc on which unpleasant choice too take, but I absolutely do agree that TOML forcing Cargo into this awkward situation is trajic, and no one shouldhave to pick between those unpleasant options in the first place.

Cabal converting from the existing design avoids some of the distortion from TOML's perverse incentives, but I have no doubt the language of Cabal files will continue to evolve, and I don't want "TOML goggles" to mess things up going forward.

Dhalll

Dhall is an actual programming language, and therefore squarely fixes the above issues. And to be clear I would really like to endorse Dhall as it is the right sort of way to make these things conform to a standard. There are two quibbles with Dhall as it currently exists however, that I think should be addressed first:

imports / IO. As far as I know, Dhall always allows downloading arbitrary stuff, etc. as long as you give it a content address of some sort to make it pure (like fixed output derivations). This is a fine design in general, but I worry about e.g. cabal2nix needing internet access to do it's job, which (ironically giving the nix inspiration!) would be a regression and major pain. If there is a way force Dhall programs to be more self-contained, that would assuage my concern.
- abstract interpretation. The Dhall model somewhat assumes that dhall will evaluate a closed term, spltting out a value for the consuming application to deal with. But this doesn't totally reflect how Cabal works. Today, we have automatic flags, which means we need to vary parameters based on results. With a few flags this can be brute forced, but with more abstract interpretation is much more efficient. Dhall's strong normalization is good to make such static analysis tractable, but we might also want additional restrictions to make it efficient and also easier for humans to understand.
Maybe that is overkill for manual flags, but @mpickering's and my presentation on what comes after CPP (https://icfp19.sigplan.org/details/hiw-2019-papers/9/Configuration-but-without-CPP, https://www.youtube.com/watch?v=YupkE1vsZ4o) has gotten me thinking about abstract interpretation more broadly. Eventually we want to tackle the goal of "type safe packaging" i.e. ensuring all valid version solving solutions will in fact compile. It's hard, but not tackling it is anathema to our values, and abstract interpretation of various sorts is key to making it work.

So yeah, in conclusion I want Dhall to work, but it's important we we be able to restrict ourselves to a sort of "mini Dhall" so we can do this analysis and we will have to integrate Dhall with Cabal fairly deeply. I'm not sure whether the current Dhall implementation supports such a restricted "mini Dhall", but that can easily be fixed.

Bodigrim commented 3 years ago

Then after a multi-year grace period, the new format would become the standard and projects could drop their autogenerated .cabal files.

Remember that Hackage is an append-only repository. It would be utterly disappointing if a future version of Cabal would be unable to build an old package just because it no longer parses its very own package format. So I don't think it would be wise to abandon a parser of Cabal files even after a very long grace period. And if we are to retain the parser and all its complexity, than what exactly are we to gain? What about other tooling (e. g., Stack)?

With regards to editor support, why aim for a generic JSON autocompletion? These days we should not settle for anything less than a domain-specific language server, and custom format is not a hindrance for it.

I'm sorry if my tone sounds harsh, but I'm afraid we are chasing an ideal to the detriment of compatibility, as it's very customary in Haskell community.

djc commented 3 years ago

Maybe Starlark is a decent option if logic is important to this project?

Mikolaj commented 3 years ago

Given that the quality standards and popularity standings for configuration languages change every decade, I'd rather focus on a good internal representation, support the old cabal format (and only this format) forever-guaranteed and let contributors add exact-parser-prettyprinters for whatever format works best for them. We also need a story for keeping in sync many files that contain the same information or for translation on the fly (e.g., when showing a .cabal form a Hackage webpage of a package).

kamoii commented 3 years ago

This might be total stupid ideaw, but how about using limited Haskell for configuration? For example, configuration is a module exports one binding named config which has type Config. Limited to Haskell98, no GHC extension, no external pacakge, no IO. Noone is suggesting it so I assuming there is an obvious reason this is not a good idea..

fgaz commented 3 years ago

@kamoii the main argument against that is that a Haskell program is not guaranteed to terminate

ocharles commented 3 years ago

The argument is also to not invent something new as much as possible. We want to leverage existing tooling, syntax highlighting, etc. A limited Haskell only lets us benefit from a fraction of this

phadej commented 3 years ago

Two forgotten things in this discussion:

First: JSON / YAML / ... and even Dhall would still need some stringly sublanguages, as @Ericson2314 hints. Consider build-depends or mixins fields.

build-depends: foo (>=0.4.0.0 && <0.4.1) || (>=0.5 && <0.6)
mixins:        foo (Foo.Bar as AnotherFoo.Bar, Foo.Baz as AnotherFoo.Baz)

"build-depends": {
    "foo": {
      "and": [ { "or": [ { ">=": "0.4.0.0"  } 
                         , { "<": "0.4.1" }
                         ]
                 }
               , { "or" : [ { ">=": "0.5" }
                          , { "<": "0.6" }
                          ]
                 }
               ]
    }

(better would be model version numbers as [0 4 0 0] i.e. array of integers - though what [0.0 4.0 0.0 0.0] means?!)

I don't even try to model mixins. Dhall would look terrible as well (from dhall-to-cabal README)

in    GitHub-project { owner = "ocharles", repo = "example" }
    ⫽ { version =
          prelude.v "1.0.0"
      , library =
          prelude.unconditional.library
          (   prelude.defaults.MainLibrary
            ⫽ { build-depends =
                  [ { package =
                        "base"
                    , bounds =
                        prelude.majorBoundVersion (prelude.v "4")
                    }
                  ]

There is also license which uses SPDX license expressions, which is a standard just for that. NPM embedds them as string, i.e. there is no benefit from generic JSON strings helping edit them. (though honestly that field is rarely edited).

EDIT: Also file globs (though I think that was a mistake to add them to .cabal format)

If we use stringly sublanguages (like in @TikhonJelvis examples) we we will need to explain their syntax anyway. Nothing changes in comparison with current format.

Writing a tool to automatically edit bounds is still difficult with stringly build-depends (as difficult as today, I would say).

Second: Performance matters. Solver parses plenty of package descriptions while figuring out dependencies. Dhall unbounded computation costs is asking for problems. Package descriptions in indicies should be (close to) normal forms. Common stanzas make current format not normal, but their substitution is cheap (linear cost).

Currently hackage-tests test suite (cabal run hackage-tests parsec) reports on my machine:

Reading index from: /cabal/packages/hackage.haskell.org/01-index.tar
151055 files processed
41573 files contained warnings
0 files failed to parse
147.663162 seconds elapsed
0.977546 milliseconds per file

That 1ms per file is a good goal. cabal is used as an interactive tool.

A solution is that cabal sdist would normalise the package description files before packing a source tarball. That would work, but we would need to specify the normal form independently. The normal form would need to be only readable by humans, not necessarily convenient to write.

That approach would make sense for revisions too, it might be substatially easier to specify which edits are valid on the normal forms, then on "full" grammar. The current check is semi-syntactical, which is somewhat limiting.

Another solution is that cabal update would produce a cache with normalized descriptions. The drawback is that it would take at least 3 minutes! (Or be too clever and brittle trying to reuse older caches).

If we really want to change the format to something "used elsewhere", then EDN is actually not that bad (I was taught scheme in school).

:build-depends
  { "foo"
    (|| (&& (>= #(ver 0 4 0 0)) (< #(ver 0 4 1)))
        (&& (>= #(ver 0 4) (< #(ver 0 6))))
    )
  }

:mixins
  { "foo"
    (as [Foo Bar] [AnotherFoo Bar])
    ; the drawback is that everything is different, if EDN structure is used deeply:
    ; even the module names, as "Foo.Bar" is an expression in a sublanguage for module names,
    ; something general EDN tools are not aware of.
    ...

TL;DR, I challenge JSON, ..., Dhall suggestors to model e.g.

in their favourite "syntax" format. Otherwise this discussion is just wasting everyones time by not being concrete.

(IMO simple examples don't tell much, simple stuff is easy).

jmorag commented 3 years ago

Does "unlimited Haskell" as opposed to limited Haskell qualify as not something new? IMO the argument that Haskell is turing complete isn't that compelling, as the nix expression language is also. With a cabal file being just some Haskell expression of type Config or a single file program

import Cabal

main = buildPackage PackageOptions {...} -- dependencies and build configuration here

we get to use all of the existing Haskell tooling and get around the sublanguage issues by representing everything as normal Haskell values, which if I understand correctly cabal today does anyway.

Going further with this train of thought, it seems like any configuration format, JSON, YAML, Dhall, edn, TOML, etc. is basically some level of indirection that gets parsed into a Haskell value at build time, so why not just focus on making a more convenient EDSL for Cabal the library?

gbaz commented 3 years ago

There's a reason we encourage cabal files rather than custom setups -- far easier for external consumption (even with a not fully specified grammar). To get values out of a haskell executable it needs to either emit them (in which case the format it emits in is the actual spec) or you need to build and link into it directly. Either way you're compiling and building a haskell program every time you want to ask "what modules does this package provide." That is not feasible for, e.g., a package store such as hackage.

jmorag commented 3 years ago

The external consumption argument is very compelling. I guess we could have cabal generate a lockfile from a build specification in Haskell and have other tools read that. We already have cabal.project.freeze/stack.yaml.lock so there's precedent, but those files haven't historically been required.

fgaz commented 3 years ago

...but then you get in the same situation as now, it's just that cabal is doing the conversion instead of dhall2cabal/hpack/... You still have to commit/upload/distribute the redundant (as opposed to freeze files) generated file

jneira commented 3 years ago

In this issue there is an interesting discussion about how to handle other configuration formats than the builtinn cabal one: https://github.com/haskell/cabal/issues/5343

andreasabel commented 3 years ago

I see no problem that a e.g. YAML outer syntax has to be complemented by ad hoc expression syntaxes for certain fields (constraints etc.) that transcend YAML. Having an outer YAML syntax would still allow third-party tools easy access to certain contents of the .cabal file, and nice syntax (that is, the current syntax) for constraints can parsed from string fields using/adapting the existing cabal parsers.

YAML-bombs can be avoided by restricting to a sublanguage of YAML.

The syntax examples in https://github.com/haskell/cabal/issues/7548#issuecomment-899557379 look like straw-mans to me.

michaelpj commented 3 years ago

Does "unlimited Haskell" as opposed to limited Haskell qualify as not something new?

I encourage anyone who thinks this is a good idea to think about how much fun Setup.hs is already (hint: extremely unfun). That is: if you need to compile and run a Haskell program to work out what your config is now you need configuration to work out how to compile and run the config program. What compiler options does it use? What libraries does it have access to? What GHC version is it using? etc. And what if the level-2 configuration is also a non-trivial Haskell program? Time for level-3 configuration. Extremely unfun.

phadej commented 3 years ago

Having an outer YAML syntax would still allow third-party tools easy access to certain contents of the .cabal file, and nice syntax (that is, the current syntax) for constraints can parsed from string fields using/adapting the existing cabal parsers.

What's wrong with using Cabal as a library? I had great success with that. You need it anyway for expression parsing.

andreasabel commented 3 years ago

What's wrong with using Cabal as a library? I had great success with that. You need it anyway for expression parsing.

For the Haskell programmer, there is the obstacle of Cabal being a large package that regularly undergoes changes. Some third parties might not even use Haskell to write code that extracts information from a .cabal file. YAML parsers are ubiquitous...

Anecdotally, I have just written a small tool (https://github.com/andreasabel/cabal-clean) to partially clean artefacts from dist-newstyle/build, and I originally considered drawing some information (version, tested-with) from the respective .cabal file. But I shied away as there was no light-weight parser for cabal files.

phadej commented 3 years ago

That's the old problem of Cabal library being both the package description reading library as well as its interpration i.e. building. Former part barely changes (except of normal "let's make better library").

I'd welcome the split, as my tools use only that "parse .cabal file" part. Distribution.Simple namespace can be left for build-type: Custom packages and cabal-install use.

EDIT: even if the outer format were JSON or YAML, a library for working with it on higher level then "it's some jSON' still have to exist (c.f. cabal-plan library for plan.json files).

TikhonJelvis commented 3 years ago

What's wrong with using Cabal as a library? I had great success with that. You need it anyway for expression parsing.

To use Cabal as a library, I have to use Haskell and probably write a Cabal file for that script so that it can depend on Cabal as an external library. Figuring all of that out is a pretty steep up-front cost!

Making it easy to parse project metadata in any language would lower the barrier to entry for adding Haskell to a multi-language environment. I've worked on projects that combined Python, Rust and Haskell which all fit together pretty well. There is no real reason that writing a Python script to get project metadata across these languages should be difficult.

The new syntax (TOML or whatever) won't have to handle everything. Cabal files are structured as key-value pairs along with sections. TOML could replace that, leave other things (mixins, version bounds, etc) as strings and still be useful.

fgaz commented 3 years ago

The new syntax (TOML or whatever) won't have to handle everything. Cabal files are structured as key-value pairs along with sections. TOML could replace that, leave other things (mixins, version bounds, etc) as strings and still be useful.

I see no problem that a e.g. YAML outer syntax has to be complemented by ad hoc expression syntaxes for certain fields (constraints etc.) that transcend YAML. Having an outer YAML syntax would still allow third-party tools easy access to certain contents of the .cabal file, and nice syntax (that is, the current syntax) for constraints can parsed from string fields using/adapting the existing cabal parsers.

The problem is that there's very little that can actually be expressed directly in TOML/YAML, mainly top-level strings like package name, description, author, copyright...

As soon as you want other fields you need additional or cumbersome syntax, and extra validation on top of that.

For example:

Anecdotally, I have just written a small tool (https://github.com/andreasabel/cabal-clean) to partially clean artefacts from dist-newstyle/build, and I originally considered drawing some information (version, tested-with) from the respective .cabal file.

These two fields are structured, and would need to be represented as something like

version = [ 0, 1, 0, 0 ]
tested-with = [
  { compiler = GHC, version = [ 8, 10, 5 ] },
  { compiler = GHC, version = [ 9, 0, 1 ] }
]

where "GHC" needs to be validated and all the versions need to be checked for emptiness and component length.

This means that even in other languages you'd need a library to parse .cabal files for everything but the most basic stuff.

I think the most beneficial thing would be to split the parser from the rest of Cabal so that we'd have a light-weight library to depend on, like @phadej and @gbaz suggested: #7559

edit: I checked, and tested-with is even more complex than this: the version is a version range (even though most people just use ==), so as complex as build-depends!

To use Cabal as a library, I have to use Haskell and probably write a Cabal file for that script so that it can depend on Cabal as an external library

Technically, Cabal being a core/boot/builtin (i forget how they're called) library, you only need ghc unless you depend on something else

jmorag commented 3 years ago

Perhaps nickel could be useful. It has some upsides over Dhall in that it's gradually typed so it can express anything in cabal that would break Dhall's type system. The contract system seems like it could validate things like package version constraints without having to resort to stuffing everything into strings and leaving validation to a custom parser. Downside is that it's still relatively new and unstable, AFAIK.

More generally speaking, the goals of having an expressive config format that's pleasant for humans to write and an unambiguous one that's easy for machines to read seem to be at odds. Perhaps cabal should support reading only a very verbose low-level format like the current cabal or a json version or even a protobuff, but expose a configurable hook that runs at the same stage as stack runs hpack to allow people to write whatever high level format they want to.

Profpatsch commented 3 years ago

If we use stringly sublanguages (like in @TikhonJelvis examples) we we will need to explain their syntax anyway. Nothing changes in comparison with current format.

I think this statement is akin to “we can’t go all the way so changing anything is not an improvement”, which is … err.

Profpatsch commented 3 years ago

TL;DR, I challenge JSON, ..., Dhall suggestors to model e.g.

* https://hackage.haskell.org/package/transformers-compat-0.7/transformers-compat.cabal

* https://hackage.haskell.org/package/raaz-0.3.0/raaz.cabal

I think those files could be the gold standard for anybody proposing a more concrete replacement.

Profpatsch commented 3 years ago

Does "unlimited Haskell" as opposed to limited Haskell qualify as not something new? IMO the argument that Haskell is turing complete isn't that compelling, as the nix expression language is also.

I think this issue is not about whether using any turing-complete language for configuration is worthwhile. For that discussion please open a separate issue.

Profpatsch commented 3 years ago

Making it easy to parse project metadata in any language would lower the barrier to entry for adding Haskell to a multi-language environment. I've worked on projects that combined Python, Rust and Haskell which all fit together pretty well. There is no real reason that writing a Python script to get project metadata across these languages should be difficult.

I want to support this statement. For any kind of build tool, having any kind of well-known syntax that is not specific to the language is already huge in how much more integration it makes possible.

Profpatsch commented 3 years ago

On the topic of dhall/nickel, I want to rephrase my original statement:

Note that all of these are (mostly) isomorphic to JSON (scalars, lists, dicts), which is important for easy translation between them (e.g. for config generation purposes).

Using any language that you need to evaluate (other than just parse) is a mistake in my books, because it will loose the advantage of json schema. You can always treat the resulting file format as a compilation target that you generate from dhall/nickel/cue/whatever.

jmorag commented 3 years ago

By json schema, do you mean in-editor completion and documentation on hover when editing the configuration?

TikhonJelvis commented 3 years ago

I didn't know about this myself, but it turns out that there's a language server for YAML using LSP (language server protocol). If we wrote a JSON schema for Cabal files, we'd get basic validation and completion for "free" in editors like VSCode and Emacs.

TikhonJelvis commented 3 years ago

Honestly, I kind of want to give it a try now. Not sure I'll have time over the next couple of days, but I can knock together a (basic, incomplete) JSON schema for Cabal and see what it's like when editing in YAML.

TikhonJelvis commented 3 years ago

After playing a bit with JSON Schema, I've started sketching out a schema for Cabal package descriptions. I've already got a feature I've always wanted: autocomplete for SPDX licenses.

yaml-autocomplete-spdx-license

To get this set up, I needed:

a JSON schema for Cabal package descriptions, including a list of supported SPDX identifiers
yaml-language-server
lsp-mode for Emacs
a bit of config to point Emacs to my cabal-schema.json file for that buffer

It's not perfect—won't help with compound SPDX expressions and I don't know how to document each individual license—but it is pretty great for the common case. When I start a new Haskell project I either have to look up an old .cabal file or Google for the license code to use, so it's great if my editor can help me directly instead.

It was honestly easier to set up than I expected, in large part because I already use lsp-mode for other languages. It'll take some care with schema design, but this absolutely has the potential for a great user experience.

jneira commented 3 years ago

We always could walk the inverse path and create a plugin in haskell-language-server to

format the .cabal file (already a wip): https://github.com/haskell/haskell-language-server/pull/2047
add dependencies automatically: https://github.com/haskell/haskell-language-server/issues/155
completions and diagnostics (parse errors and check errors/warnings)
even add code actions to trigger builds and clean (clean would delete the specific cache for cabal+hls)

Those features would work for all existing packages, but they have to be implemented of course 😄

phadej commented 3 years ago

Auto-completion works today (at least in VIM) Screenshot from 2021-08-29 14-30-17

See https://github.com/haskell/cabal/tree/master/editors/vim. It's not very sophisticated syntax highligher, but it works.

phadej commented 3 years ago

For the record: let's not forget about YAML quirks:

name: no       # false
version: 5.0   # is it 5.0 or 5? these are numbers, yet 5.0.0 is a string

The latter is very unfortunate quirk of how versions work now. 5.0 and 5 are not the same.

These exist on Hackage (though rarely in single digit x and x.0 case):

https://hackage.haskell.org/package/DSA 1 1.0 1.0.0
https://hackage.haskell.org/package/pipes-cereal 0.1.0 and 0.1.0.0

YAML is just broken format (yet too big to fail, so people use it instead of anything else). (Some) subset of YAML is not YAML anymore, the arguments of getting anything by free is void then, except if the tools are standardizing on the same subset (which is not the case AFAIK).

Recall how hpack implements its own splicing instead of using YAML's anchors! I still wonder why? Bad support across libraries for anchors, hopefully not.

TikhonJelvis commented 3 years ago

I generally lean towards TOML over YAML myself. I used YAML in my example because it already has tooling for auto-complete/linting/etc through LSP—I didn't have to write any code, just install and configure a stock plugin.

Looks like TOML has similar tooling support through Taplo but it isn't as mature as the YAML tooling and doesn't have Emacs integration. Emacs integration for Taplo wouldn't be difficult—a few days of work?—and would give us the same kind of JSON-Schema-based validation and auto-complete.

I expect this pattern to repeat for other tools: TOML is a nicer format than YAML but has less existing support, so it would take a bit of extra work at times. This still nets to far less work than a totally custom format nobody else uses though!

The broader point is that this same pattern repeats for everything, not just editor autocomplete. A config format is inherently a public interface, and interfaces benefit from standardization far more than other parts of a system.

TikhonJelvis commented 3 years ago

Turns out if I write the schema a bit differently, I can get documentation for each option in auto-complete: A screenshot of an auto-complete menu for "build-type". The menu has four options ("Configure", "Custom", "Make" and "Simple") with the cursor over "Make". Beside the cursor, there is documentation for "Make" saying "Calls Distribution.Make.defaultMain".

No Elisp hacking needed :).

I haven't used JSON Schema before and, honestly, it's pretty nice. Flexible enough to capture a lot of the structure we care about even without using its extensibility features, but structured enough so that general-purpose tools and UIs can use do a surprisingly good job with it.

fgaz commented 3 years ago

That's nice, but as soon as you step outside the few top-level enum/string-typed fields you still get these problems https://github.com/haskell/cabal/issues/7548#issuecomment-899112181 https://github.com/haskell/cabal/issues/7548#issuecomment-899557379

TikhonJelvis commented 3 years ago

That's nice, but as soon as you step outside the few top-level enum/string-typed fields you still get these problems...

This already covers everything a simple-to-intermediate Haskell project needs. Even if we don't get anything beyond basic syntactic validation and completions for the "easy" fields, we will have improved the experience for most Haskell projects and, especially, beginner Haskell projects.

It's not just top-level fields, either. A JSON Schema can cover the structure of libraries/executables/test suites/etc. It's pretty handy for my editor to tell me what I need to define a test suite! Today, I go find an old .cabal file to copy from :/.

A screenshot of Emacs with auto-completion and a doc display for a Cabal test-suite's "type" property.

This is some pretty nice cross-editor functionality that came from like half a day of fiddling with JSON Schema. Building on top of modern, well-integrated tools is a massive force multiplier even when the tools themselves aren't ideal.

If Cabal files were merely data they would be fine, but unfortunately they are code, due to the conditionals, and parameters used in those conditions.

The proposal here is to replace Cabal's surface syntax with YAML or TOML. The constructs and semantics of "Cabal as a programming language" would remain completely unchanged, only the way we write and parse them changes. Instead of using an ABNF grammar to specify the syntax of if statements, we would specify them using YAML/TOML building blocks and a JSON Schema.

Unlike trying to move to Dhall or something, the behavior of package files doesn't change at all. The only thing we're doing is moving as much as we can from a historical syntax that nobody outside the Haskell world can work with to a pretty similar-looking syntax that can easily integrate with the modern programming ecosystem.

JSON / YAML / ... and even Dhall would still need some stringly sublanguages

This is absolute true... and it's absolutely fine. In practice, Cabal already acts like a bunch of stringly subslanguages that are connected by a big ad-hoc grammar; we'd be going from that to having the same set of sublanguages connected by a format that's far easier to work with. It's an incremental improvement—which is great since we shouldn't be trying to solve all our possible problems at once!

TikhonJelvis commented 3 years ago

If anyone is interested, here's the simple JSON Schema I've been playing with. It's nowhere near complete or tested—I haven't gotten to conditionals/etc—but it was enough for me to get a feel for both JSON Schema and the tooling around it.

My conclusions so far is that JSON Schema is flexible enough that we can do a pretty good job of specifying package descriptions even if we stick to the "core" vocabulary JSON Schema defines. Once we have that, tools for validation and editing seem to work remarkably well without additional configuration.

AshleyYakeley commented 3 years ago

Simplest solution: keep cabal files for Hackage, add hpack support to cabal-install.

jkachmar commented 3 years ago

Strong +1 for YAML or TOML for many of the reasons listed above.

Strong -1 to Dhall, Nickel, etc. for many of the reasons listed as well.

My hope is that if Cabal were to natively adopt YAML/TOML with support for some commonly used subset of the existing .cabal syntax, it would be possible to add support for parsing a .cabal file and generating an equivalent YAML/TOML file.

Presuming that this is possible, it would make it significantly easier for people to write tools that can parse & manipulate our package manifests in the context of systems for which Haskell is not a language with first-class support.

e.g. License scanning tools, package vulnerability scanning tools, etc.

haskell / cabal