Move OPAM's data formats to sexps

dbuenzli commented 8 years ago

Given the unwise decision by the original OPAM developers to develop a custom data format for opam's files and the inability of current OPAM maintainers to provide a reasonable library (#2677) to read these formats I suggest we simply move to use toplevel sequence of s-expressions for OPAM's metadata file formats.

S-expressions have widespread editor support, are easy to parse, easy to write, easy to modify and easy to read, especially because of the equivalence between quoted and unquoted tokens.

Here's a sample OPAM file and it's translation to sexp:

opam-version: "1.2"
maintainer: "Daniel Bünzli <daniel.buenzl i@erratique.ch>"
authors: ["Daniel Bünzli <daniel.buenzl i@erratique.ch>"]
homepage: "http://erratique.ch/software/uucp"
doc: "http://erratique.ch/software/uucp/doc/Uucp"
dev-repo: "http://erratique.ch/repos/uucp.git"
bug-reports: "https://github.com/dbuenzli/uucp/issues"
tags: [ "unicode" "text" "character" "org:erratique" ]
license: "ISC"
depends: [
 "ocamlfind" {build}
 "ocamlbuild" {build}
 "topkg" {build}
 "uucd" {test} # dev really
 "uunf" {test}
 "uutf" {test}
 ]
available: [ ocaml-version >= "4.01.0" ]
build: [[
  "ocaml" "pkg/pkg.ml" "build"

(opam-version 1.2)                                                                               
(maintainer "Daniel Bünzli <daniel.buenzl i@erratique.ch>")                                      
(authors ("Daniel Bünzli <daniel.buenzl i@erratique.ch>"))                                       
(homepage http://erratique.ch/software/uucp)                                                     
(doc http://erratique.ch/software/uucp/doc/Uucp)                                                 
(dev-repo http://erratique.ch/repos/uucp.git)                                                    
(bug-reports https://github.com/dbuenzli/uucp/issues)                                            
(tags (unicode text character org:erratique))                                                    
(license ISC)                                                                                    
(depends                                                                                         
  ((ocamlfind build)                                                                             
   (ocamlbuild build)                                                                            
   (topkg build)                                                                                 
   (uucd test) ; dev really                                                                                    
   (uunf test)                                                                                   
   (uutf test)))                                                                                 
(available (ocaml-version >= 4.01.0))                                                            
(build                                                                                           
 ((ocaml pkg/pkg.ml build --pinned %{pinned}%)))

AltGr commented 8 years ago

Err, I'm confused, the issue with #2677 seemed to be with getting the high-level structures (filters, formulas, etc.), the parser and printer to and from the internal ST are already contained modules...

replacing them with sexps would certainly be a lot of work that I don't see the need for, not counting the opam files already present throughout. Also, your reluctance to use a library as soon as it has any dependency at all seems to me to be more a matter of taste than a real issue.

So, I'd take this as a troll, but you generally have sensible points, so I would like to understand what really is the issue and why.

dbuenzli commented 8 years ago

Also, your reluctance to use a library as soon as it has any dependency at all seems to me to be more a matter of taste than a real issue.

That's not the point. When you bootstrap you need to cut the dependencies at the lowest point so that the other things can be built without introducing cycles.

If you take for example the topkg and topkg-care split it was carefully (and painfully) done so that topkg would not depend on anything. This to cut the deps for users of the system but also to allow the dependencies of topkg-care (rresult, fmt, logs, bos, cmdliner, webbrowser) to only depend on topkg itself and thus allow to use both topkg and the topkg-care release and bureaucracy dealing tools on them even though they are actually dependencies of topkg-care (see here, in fact topkg-care can even be used onto topkg and topkg-care)

Since I want to be able to use the OPAM file format as the metadata file format for the eco-system I need to be able to bootstrap codecs simply from an OCaml install. There are a lot of situations where you might want to access the data of OPAM files without assuming an opam install (self-contained bundle, conversion to system packages, opkg, etc.). Now the more dependencies the codecs have the harder the bootstrap becomes, and the less you can use the tools you develop for the dependencies that are required at the bootstrap step (see the topkg example above).

Note also that in these use cases I may not need a full blown parser for the high-level features of the data format. E.g. in opkg I'm not interested at all by build:, neither by dependency constraints, only maybe build, dev, test dep specs and even, that's not fundamental, only informational for the end user. In opam-installer the task is even simpler. In the self-contained bundle case only build: (and filters) would matter, not dependency constraints.

So I actually think that switching to an easy, non-special, sexp file format with well documented key value types would be a great way of cutting the cycle.

This would free you of having to provide codecs, allow you to use regexps to split lines without me screaming at you, and more importantly would allow end-users of the system to deal with a simple, wildly known (and wildly used in OCaml) data format. The OPAM data format is just obnoxious (this was communicated to me more than once by newcomers) and one more special thing to be learned when dealing with the eco-system that we could avoid.

not counting the opam files already present throughout.

That's not really an argument opam file formats are already not compatible from version to version and rewrites do have to occur. A migration command wouldn't be hard to provide.

AltGr commented 8 years ago

I see, thanks for explaining the motivations. Well, if it is only the parser that you need, that would be straightforward to extract. It returns this type, basically a list of field-structured value bindings.

With a little more work, the base of the OpamFormat module, which provides generic lenses to convert to/from the above from/to the internal record types, could be added too, with the specific lenses for the high-level types (formulas, commands, urls, etc.) pulled out. Then the specific uses defining all the concrete file formats, in the overweight OpamFile module, would be out of your scope.

Basically, you would get the implementation of the General syntax section of the manual, but not the specific file formats, which are implemented in OpamFile.

This said, I have no particular love for the current opam file format -- it does the job, however, and the costs of changing it seem to outweight the benefits. If someone is interested in writing alternate backends providing other concrete syntaxes corresponding to the opamfile type, that could be very interesting.

@hannesm probably has similar interests in allowing Conex to interact with opam-syntax files, but without depending on opam ?

dbuenzli commented 8 years ago

I see, thanks for explaining the motivations. Well, if it is only the parser that you need, that would be straightforward to extract. It returns this type, basically a list of field-structured value bindings.

Most of the time I need to be able to access them as sets of key value bindings. E.g. here's what I have in Topkg_care.Opam.

https://github.com/dbuenzli/topkg/blob/61a2e495d31a291b35b3538be3f454c9615d6b90/src-care/topkg_care_opam.mli#L20-L24 https://github.com/dbuenzli/topkg/blob/61a2e495d31a291b35b3538be3f454c9615d6b90/src-care/topkg_care_opam.ml#L38-L108

And in opkg:

https://github.com/dbuenzli/opkg/blob/a3798fee9c142c3f438c4e23e6418cc5f4c4bac6/src/opkg_opam.mli https://github.com/dbuenzli/opkg/blob/a3798fee9c142c3f438c4e23e6418cc5f4c4bac6/src/opkg_opam.ml

So simply maps from string to list of strings. Note however that I sometimes do need to parse these lists of strings, e.g. to extract the deps and depopts, so having a bit of the typing layer may still be useful (or a good specification of the fields, e.g. there are quite a few fields I still don't know if they allows for multiple values or not, e.g. homepage:).

In the future I'll need to have the ability to update hand-written opam files programatically to sync them with information from the build system.

it does the job, however, and the costs of changing it seem to outweight the benefits.

I wouldn't be sure about that in the long term both for the OCaml eco-system and its usability and for OPAM as more general package manager (these users might prefer a standard file format for the tools they have for handling these files rather than having to start to write their own OPAM file parser).

It is certainly a lot of work but I'd say mostly on the opam devs as the change could be absorbed by the opam tool supporting both syntax for a while and gradual migration of existing files in repos be triggered by submissions to the OCaml OPAM repository.

hannesm commented 8 years ago

at the moment, I use a custom format in conex, but I would like to use a) minimal dependencies and b) the same as opam does. I like sexps a lot, and think they're very well suited for being human readable and both human and machine editable. I honestly never understood why opam has yet another adhoc file format.

samoht commented 8 years ago

I honestly never understood why opam has yet another adhoc file format.

I was young and naive ...

hannesm commented 8 years ago

@samoht you're still young :)

rgrinberg commented 8 years ago

Another vote for sexps (or even JSON for that matter) here.

AltGr commented 8 years ago

@dbuenzli, Ok, I get a better understanding of your need; for some reason I was under the impression that the syntactic layer without the semantic layer wouldn't be any use. Looking at the new spec, most fields are indeed just lists of items, while others are trees with a very limited depth. Then, of course, we have small trees for the included filters and formulas ; the modules handling those are mostly self-contained though.

(or a good specification of the fields, e.g. there are quite a few fields I still don't know if they allows for multiple values or not, e.g. homepage:).

It may still need some more proof-reading, but the new doc should have a comprehensive spec of the file formats. This clearly states it allows multiple values: http://opam.ocaml.org/doc/2.0/Manual.html#opamfield-homepage

It is certainly a lot of work but I'd say mostly on the opam devs as the change could be absorbed by the opam tool supporting both syntax for a while and gradual migration of existing files in repos be triggered by submissions to the OCaml OPAM repository.

Yes, that was my suggestion, opam could be made to transparently load from either format; some changes will be needed to the AST, as distinct cases in the syntax would become ambiguous, but it may not be that much work.

Note about the example above: getting rid of all the quotes is nice, but there is currently a semantic difference between string "foo" and ident foo, which is mostly equivalent to "%{foo}%"¹. In commands, and in the new extended depends (depends: "foo" {= version}), this matters; do you intend to rather make all variable references explicit (e.g. (%{make}% install) instead of |make "install"]) ? ¹ they differ when foo is undefined, in which case "%{foo}%" is defined and is the empty string.

avsm commented 8 years ago

I don't see repository migration as particularly problematic here, particularly if we can load both at the parser level. I'm strongly in favour of anything that will make opam file easier to manipulate programmatically, and sexp/json fits that...

AltGr commented 8 years ago

I very quickly hacked the printer to see what the results would look like. Here is an example for one of the more complex opam files:

(opam-version: "2.0")
(name: "nocrypto")
(version: "0.5.3")
(synopsis: "Small functional-style crypto library.")
(description: """
Ciphers: AES, 3DES, RC4.
Hashes: MD5, SHA1, SHA2.
Pubkey: RSA, DH, DSA.
Rng: Fortuna.""")
(maintainer: "David Kaloper <david@numm.org>")
(authors: "David Kaloper <david@numm.org>")
(license: "BSD2")
(tags: "org:mirage")
(homepage: "https://github.com/mirleft/ocaml-nocrypto")
(bug-reports: "https://github.com/mirleft/ocaml-nocrypto/issues")
(depends: 
  ("ocaml" >= "4.02.0")
  ("ocamlfind" build)
  ("oasis" build & >= "0.4.2")
  ("ocamlbuild" build)
  ("cstruct" >= "1.6.0")
  "zarith"
  "sexplib"
  ("ppx_sexp_conv" build)
  ("mirage-no-xen" | ("mirage-xen" "mirage-entropy-xen" "zarith-xen"))
  ("ounit" test)
)
(depopts: "lwt")
(conflicts: 
  ("mirage-xen" < "2.2.0")
  ("mirage-entropy-xen" < "0.3.0")
)
(flags: light-uninstall)
(build: 
  (
    "./configure"
    "--prefix"
    prefix
    "--%{lwt:enable}%-lwt"
    "--%{mirage-xen+mirage-entropy-xen:enable}%-xen"
  )
  (make)
)
(build-test: 
  ("./configure" "--%{ounit:enable}%-tests")
  (make "test")
)
(install: make "install")
(remove: "ocamlfind" "remove" "nocrypto")
(dev-repo: "git+https://github.com/mirleft/ocaml-nocrypto.git")
(url 
  (src: "https://github.com/mirleft/ocaml-nocrypto/archive/v0.5.3.tar.gz")
  (checksum: "md5=1b771555139c23da4fdf02244fc7b4a9")
)

Some remarks:

"""-enclosed strings won't work on a standard sexp parser; requiring to escape the package descriptions wouldn't be nice though...
opam is flexible w.r.t missing parens, and this quick hack retains it. Using knowledge of the expected type, (install: make "install") is understood as (install: (make "install")) rather than (install: (make) ("install"))
the "options", i.e. the optional-postfix-braces syntax (foo { bar baz }) is encoded as a list ((foo bar baz)). However, this can be ambiguous where lists are already allowed, e.g. in package formulas, where (foo bar) can refer to foo with option bar or to foo and bar. A solution could be to always force parens around options (we would get (foo bar) or ((foo) (bar)).
I left the colons after field names, since that is how they are referred to everywhere, and it may also improve readability

we could of course opt to a more lisp-like encoding of formulas and filters to make parsing them trivial (rather than encode them as lists and need parsing on top of it). Something like

(depends:
(&
("ocaml" (>= "4.02.0"))
("ocamlfind" build)
("oasis" (& build (>= "0.4.2")))
("ocamlbuild" build)
("cstruct" (>= "1.6.0"))
("zarith")
("sexplib")
("ppx_sexp_conv" build)
(| ("mirage-no-xen") (& ("mirage-xen") ("mirage-entropy-xen") ("zarith-xen")))
("ounit" test))
)

I don't feel like writing it that way though.

dbuenzli commented 8 years ago

do you intend to rather make all variable references explicit (e.g. (%{make}% install) instead of |make "install"]) ?

I think it's actually better, it makes it easier to see what is going the distinction between "make" and make can be easy to miss in practice.

"""-enclosed strings won't work on a standard sexp parser; requiring to escape the package descriptions wouldn't be nice though...

I don't know what these strings are but you can write:

(description "\
bladf bla dflkj ad 
asdflj sadflkj sdflj  
asdflkj sadfj")

which is good enough and nothing needs to be escaped. In any case, for example in topkg, you don't write these things yourself in the OPAM file, description gets automatically extracted from somewhere else (the README in topkg) and added to the OPAM file during the release process.

opam is flexible w.r.t missing parens, and this quick hack retains it. Using knowledge of the expected type, (install: make "install") is understood as (install: (make "install")) rather than (install: (make) ("install"))

You meant rather than ("install:" "make" "install"). This should be avoided it will complicate interpretations by third-party tools.

the "options", i.e. the optional-postfix-braces syntax (foo { bar baz }) is encoded as a list ((foo bar baz)).

Why not simply (foo (bar baz)) ?

we could of course opt to a more lisp-like encoding of formulas and filters to make parsing them trivial

Maybe not, but I still think that you should put some of the optional stuff in their own lists.

AltGr commented 8 years ago

The point of enclosing strings in """ is that they can contain unescaped " (unless of course it's three of them, consecutively...)

the "options", i.e. the optional-postfix-braces syntax (foo { bar baz }) is encoded as a list ((foo bar baz)).

Why not simply (foo (bar baz)) ?

Why not indeed, but the issue remains, in (foo (build & doc)), you don't know if build, doc are packages or constraints over the dependency foo.

dbuenzli commented 8 years ago

The point of enclosing strings in """ is that they can contain unescaped " (unless of course it's three of them, consecutively...)

Ok. As I already said I'm not sure this is a real concern, we're not writing novels in there and I expect the description field to be handled by a machine.

Why not indeed, but the issue remains, in (foo (build & doc)), you don't know if build, doc are packages or constraints over the dependency foo.

I'm not sure I fully understand what the problem is. Which production of the grammar are you trying to translate ?

dbuenzli commented 8 years ago

Btw. @AltGr I think that if people want to do this we should really do it for 2.0. Since I know this is unexpected overwork for you I'll gladly help you on this. Just tell me where I can help.

The reason why it should happen quickly is that topkg packages tarballs now ship and install opam files. Since those are not sexps they will have to use the opam libs for a while, but I want to be able to cut that dependency at some point so that once odig gets supports for compilation flags lookup, it can be used by system package managers to compile ocaml packages.

lefessan commented 8 years ago

I agree with @dbuenzli that the parser of opam files should be made as independent of OPAM as possible, so it would be possible to include it easily in other projects to parse opam files without depending on the whole opam-lib. It should also be made easy to extend OPAM with new formats of packages.

That said, I don't like the idea of using sexps for the format of opam files, it is much more verbose and less readable than the current format. I think that, if we were to make a big change to the opam file format, we should really think about it a lot, and not rush it before a major release.

The release of OPAM 2.0 contains many new features that people are eager to use, so we should not delay the release process for reason except critical bugs.

Drup commented 8 years ago

I'm going to follow Wadler's law diligently: What would be the new syntax for comments ? (that makes raw json not usable, btw, you need an extension for comments, so you might as well use something like toml ... which is already pretty similar to opam files).

Just to be sure: the old syntax is not removed, just deprecated ? Regardless of the amount of automation, converting all the opam files in everyone's repository would be a tad painful.

avsm commented 8 years ago

Just to be sure: the old syntax is not removed, just deprecated

I think the current proposal isn't to deprecate the existing one either -- just to provide an alternative, more easily machine-manipulatable syntax in OPAM 2, and then to take a decision on deprecation in a future release.

dbuenzli commented 8 years ago

it is much more verbose

Note that it is certainly not more verbose: every quotes except those of the values of description: and synopsis: in @AltGr's example can be actually removed since these strings do not contain spaces.

Also people always have a lot of things to say about the mythical "beginner" (I would rather say "newcomer"). This is precisely a point where the eco-system can be made more friendly by not using something special (given the current fashion people would even argue for JSON I guess, but JSON doesn't have comments, it's also better for machines and rather painful to edit by humans in my opinion).

What would be the new syntax for comments ?

It should be ; this is what is likely to be implemented by sexp libraries.

lefessan commented 8 years ago

the eco-system can be made more friendly by not using something special

Python was special when it appeared, still many newcomers adopted it. Idem for Ruby, Javascript, JSON, etc. I don't think this is a good argument for getting rid of the current opam file syntax. Instead, we should try to improve it, maybe try to make it more uniform by removing inconsistencies, and not adopt a new syntax that will be as "special" as the current one for people not knowing sexps, and their quoting rules.

Originally, the syntax should have been "OCaml-like", i.e. using the OCaml lexer to lex it, and a custom parser for "simple OCaml expressions". I don't really understand why we have moved away from that idea (like the addition of line comments, or identifiers with '-' in the middle), but if we want to simplify the syntax, then I would vote for moving back closer to a "simple OCaml syntax".

dbuenzli commented 8 years ago

Python was special when it appeared, still many newcomers adopted it. Idem for Ruby, Javascript, JSON,

You are comparing apple to oranges here. We are talking about a data format not a programming language.

I don't think this is a good argument for getting rid of the current opam file syntax.

Well how many languages do you want to learn in order to use a new one ?

new syntax that will be as "special" as the current one for people not knowing sexps, and their quoting rules.

But that one can be explained in two sentence, is knowledge you can reuse in many other contexts and is used by many OCaml projects as a serialization format.

Originally, the syntax should have been "OCaml-like", i.e. using the OCaml lexer to lex it, and a custom parser for "simple OCaml expressions".

This seems to go against the idea of making OPAM a reusable, general purpose, package system.

dbuenzli commented 8 years ago

In any case I'd be happy with either a fully dependency less (i.e. that depends only on ocaml) opam file format reader or a switch to sexps.

avsm commented 8 years ago

Originally, the syntax should have been "OCaml-like", i.e. using the OCaml lexer to lex it, and a custom parser for "simple OCaml expressions". I don't really understand why we have moved away from that idea (like the addition of line comments, or identifiers with '-' in the middle), but if we want to simplify the syntax, then I would vote for moving back closer to a "simple OCaml syntax".

Given the move in OPAM2 towards OCaml-independence, using the OCaml syntax seems like a poor choice for a machine manipulatable format. I'm in favour of an alternative sexp-syntax for easy machine manipulation, or if this is too much for OPAM2, to build an alternative, dependency-free OPAM file parser as a separate implementation.

One argument for JSON is that OPAM already outputs JSON from several commands, so we could (with an appropriate syntax hack for comments) use that for homogeneity. This format isn't intended to be human-editable...

dbuenzli commented 8 years ago

(with an appropriate syntax hack for comments)

FWIW the Jsonm.Uncut codec will parse JavaScript comments, but I'd really advise against using it: it defeats the purpose of using a standard data format since most libraries won't be able to read the files with comments.

Besides as you mention JSON is neither nice to read nor to edit by humans. Given that OPAM files are still read and edited a lot by humans I think that sexp give the best tradeoff for supporting both humans and machines.

hannesm commented 8 years ago

I'd be in favour to have opam-2 support both old data format and sexp format. As @dbuenzli mentioned, lots of values in opam may not contain whitespaces, and can therefore be symbols and do not need string escaping " mechanisms. I'd be happy to base conex onto the normalised s-expression printed (no newlines, no comments) variant of opam files.

Encoding the dependencies in a lisp-style (bottom part of https://github.com/ocaml/opam/issues/2682#issuecomment-250110934) looks very reasonable to me, it can be even more concise, instead of ("cstruct" (>= "1.6.0")) use (cstruct (>= 1.6.0)) (there's no need for supporting white spaces in version numbers).

I also do prefer %make% and %prefix% over the magic in make vs "make".

AltGr commented 8 years ago

About quoted vs unquoted names/versions/commands: this is unrelated to switching to sexp or not (although, of course, changing the format is a chance to simplify how it's handled). Currently, foo vs "foo" have a different meaning in several scopes, and just removing the quotes would be ambiguous. We may of course choose to replace "foo" with foo and foo with %{foo}%, or even a new syntax (%foo%, $foo...) -- but that would deserve to be discussed as its own topic.
- this is especially true in filters (ocaml:version >= "4.02.3" & os = "linux")
- shouldn't cause a problem, but boolean and int constants become ambiguous as well
- package names, on the other hand, could be safely unquoted since they don't appear in ambiguous scopes (that applies to the current syntax as well, so I could already make the change to tolerate unquoted names now)
stand-alone parser: I was under the misunderstanding that the opam higher-level types were required; the parsing/printing to/from the AST type can be provided stand-alone with a minimal amount of work (removing a couple calls to OpamStd list and strings functions; and packaging). I already split out the printer from the higher-level OpamFormat module (#2688).
of course, we would first provide an alternate parser that can be used transparently (it's easy to find out wether a file is in sexp or opam format, just looking at the first non-blank character); the alternate printer could be otherwise toggled using a variable or option. In this state, and if the changes aren't too heavy, it would probably be OK for 2.0. Then it becomes a matter of switching the default output format.
there is currently some magic to simplify a few constructs that is built above the actual printer. This includes:
- removal of extra brackets when the expected type makes it unambiguous (replaces build: [[make]] with build: make). This would be a little more verbose (e.g. authors: would always need brackets), but the current magic is ambiguous when there are nested lists, as I explained above ([foo bar] understood as [[foo bar]] rather than [[foo] [bar]] when a list of lists is expected)
- removal of unset options (replaces foo {} with foo) ; that one can and should be done at the printer level though
we probably don't want this for sexps, we can retain the flexibility in the parser for the current format but remove the "cleanup" pass from the printer.
as for the formulas format, I'd prefer retaining the infix notation over something too lispy
although we don't have concrete plans now, making opam more acceptable as a package manager outside of the OCaml ecosystem sounds like a good argument to me.

lefessan commented 8 years ago

Again, I am strongly against delaying the release of 2.0 to change the syntax of OPAM files, especially as there is no consensus currently on a replacement for the current syntax. This is a big change, and big changes do not take place at the last minute before the release.

Given the move in OPAM2 towards OCaml-independence, using the OCaml syntax seems like a poor choice for a machine manipulatable format.

Sexps are not user-friendly, so switching from the current syntax (that has some glitches, but is much more readable) for a goal that it is not even clear we want to achieve (early experiments on using OPAM as a replacement for NPM failed last year, because of OPAM intrinsic limitations) seems quite awkward to me.

avsm commented 8 years ago

Again, I am strongly against delaying the release of 2.0 to change the syntax of OPAM files

As has been noted several times, the proposal is to support both syntaxes, with the existing syntax maintained for human-editing and the new syntax saved for machine editing. This is useful both for Platform activities (such as odig) and major new features such as signing (see @hannesm's comment above about basing the conex signing on the sexp-normal form).

especially as there is no consensus currently on a replacement for the current syntax. This is a big change, and big changes do not take place at the last minute before the release.

Achieving such consensus is the purpose of this bug. It's also not clear if this would actually delay the release. The timeline we presented at the OCaml Workshop is aiming for a January release, and this feature does not affect any user-visible workflows.

Given the move in OPAM2 towards OCaml-independence, using the OCaml syntax seems like a poor choice for a machine manipulatable format. Sexps are not user-friendly, so switching from the current syntax (that has some glitches, but is much more readable) for a goal that it is not even clear we want to achieve

Once again, the sexps are not intended for human editing but for clean machine manipulation.

(early experiments on using OPAM as a replacement for NPM failed last year, because of OPAM intrinsic limitations) seems quite awkward to me.

This seems like an irrelevant strawman. What does an experiment with an OPAM 1.2 have to do with the current discussion at hand?

hannesm commented 8 years ago

@lefessan to me it looks like you're the only one opposing sexp due to subjective "not readable" arguments. surely file formats are a question of style and thus subjective, thus we'll never reach a consensus amongst developers. I don't see any harm supporting next to the current syntax an alternative option which is machine editable.

but I also spent several years in emacs lisp and other lisp dialects... ;)

lefessan commented 8 years ago

This is useful both for Platform activities (such as odig) and major new features such as signing (see @hannesm's comment above about basing the conex signing on the sexp-normal form).

Do these projects really require that the change be available in 2.0 ? From what I understood, Hannes' work will only land in 2.1.

Once again, the sexps are not intended for human editing but for clean machine manipulation.

Now, I really don't understand: if all the opam files are still in the old format, when does a tool access the sexp version of the file ? Or the workflow is a pipe where opam translates a file in the sexp format, the tool accesses the file in sexp format, modifies it, saves it, and then opam translates the new file in the old format ? If yes, that sounds like a very complex process, instead of just linking to a parser library for saving/parsing the old format.

hannesm commented 8 years ago

Dear @lefessan, the "landing" of signing is not tied to a specific opam version - what is needed is only a validation_hook, which according to @AltGr will be there soon, at latest in 2.0. Individual clients can enable checking of signatures and authors & janitors can sign their packages using 2.0. It might be 2.1 where the verification is then turned on by default, but clearly we want a smooth upgrade path which means that a set of volunteers will try it out before the big green "by default verify" button is hit. And yes, it is crucial to have no dependencies for signing, since they'd need to be there before verification can take place.

dbuenzli commented 8 years ago

Now, I really don't understand: if all the opam files are still in the old format, when does a tool access the sexp version of the file ?

The odig packaging conventions advise to install an OPAM file in the lib directory of your packages, so that odig is also able to work in non OPAM-managed scenarios.

Now if you take the topkg release workflow it manipulates the OPAM file it puts in the release tarball (the one that will be installed in lib), for example it adds a version: field. It could also convert it at that point to make sure it is in sexp format. That way the odig tool (which is oblivious of opam) would only see sexp based OPAM files.

That said I really think if the switch to sexps is done, the old syntax should only be retained for compat reasons and eventually retired (i.e. opam lint should warn). There's no need to have the complexity of two formats in the eco-system.

Drup commented 8 years ago

Hold on. Is there a fundamental current issue that is not solved with the release of the parser as an independent library, now that the ~~fight~~discussion between @dbuenzli and @AltGr is done ?

@dbuenzli By pure curiosity (and without the intention of starting a sparing contest) why is odig oblivious to opam, if you are using the opam format ? If you really want the opam information ... why not just ask the opam tool ? That way seems more compatible with the wider OCaml ecosystem that don't necessarily follow the exact convention you just implemented in topkg. Additionally, it feels like it would work better with things like pin --edit.

dbuenzli commented 8 years ago

why is odig oblivious to opam

Because as mentioned above it also supports non OPAM-managed scenarios (e.g. for system package managers, opam-bundle, etc.).

avsm commented 8 years ago

Because as mentioned above it also supports non OPAM-managed scenarios (e.g. for system package managers)

Confirmed... odig lets me write a "standard" packaging formula for importing ports into OpenBSD as binary packages, while retaining the ability to do source development via OPAM. Should be pretty useful for Debian etc too!

AltGr commented 8 years ago

Well, I sure wish more of this energy was directed towards polishing what's already in 2.0 and defining the streamlined workflows that I pledged for during my OCaml workshop talk, rather than towards something completely new...

@dbuenzli said he's fine with a no-dependency low-level parser, and we could leave it at that. It's probably worthwhile to see if an alternate format and global migration is desired, evaluate the cost of the migration, and think longer term about when and how that cost would be best absorbed (i.e. in 2.0 or a further release).

Tolerating both formats as input could allow automated tools to use the new format in a way transparent to opam, while not changing the habits otherwise. Of course, having two different co-existing formats increases rather than reduces the overhead for the users, so we may not want this as a permanent solution.

One more point, as I detailed above, there are still quite a few details to properly define a sexp format, and I believe these should be taken seriously rather than rushed. Also, minor changes to the current format might help, and in that case they should be done asap.

dbuenzli commented 8 years ago

evaluate the cost of the migration, and think longer term about when and how that cost would be best absorbed (i.e. in 2.0 or a further release).

Well I think that if people want this change to occur this should be done rather quickly. As I said tarballs are now shipping and installing opam files (for achieving this goal since people don't seem to understand why I'm doing this). Staying longer with the current format will in practice mean that both formats will have to be supported by tools and thus defeat the whole proposal.

One more point, as I detailed above, there are still quite a few details to properly define a sexp format, and I believe these should be taken seriously rather than rushed.

As I said I'll gladly help you with this and update the various bits of documentation.

AltGr commented 8 years ago

(@dbuenzli) As I said I'll gladly help you with this and update the various bits of documentation.

Yes, didn't get back to you on this yet, but I certainly noticed and appreciated the offer, and was still pondering how to make the best use of your time. Let's first make sure we got a consensus on this though.

(@Drup) Hold on. Is there a fundamental current issue that is not solved with the release of the parser as an independent library, now that the fightdiscussion between @dbuenzli and @AltGr is done ?

Not really; it's just that some points remain about the format being non-standard, which could be a barrier both for OCaml newcomers and for adoption of opam outside of this ecosystem. But well, if this move is to be taken at some point, we'd better be discussing when the cost would be best absorbed, and that may be now.

AltGr commented 8 years ago

Taking into account some remarks above, people following this may be interested in the following PRs:

2695 (merged) -- remove dependencies of the lexer, parser and printer. Remains to repackage them.
2700 (proposal) -- remove the "cleanup" pass when printing, e.g. print build: [[make]] rather than build: make. More verbose but with a simpler semantics.
2702 (proposal) -- unquote package names (depends: foo rather than depends: "foo")

lpw25 commented 8 years ago

I know +/-1 comments aren't that helpful. But I definitely prefer keeping the current format and making it's parser easily reusable rather than switching to sexps. Using sexps does not really make a format much easier to deal with, you still need to know what type is expected for each field, how that field should be interpreted etc. before you know if a file is valid. All sexps really do is make operator precedence explicit, at the cost of making users write all operator precedence explicitly. Notably sexps do not enforce the binding structure of a language, so issues such as how to distinguish between identifiers and constants -- which I gather is where the quoting issues come from -- are just as problematic with sexps.

AltGr commented 7 years ago

After so many pixels·seconds have been used in this thread, I think all arguments have been heard and it is time to reach a conclusion. There is indeed some incentive to move to sexps, and that boils down to:

lowering the barrier of entry by using an overall better-known syntax, rather than a custom one
making the format more easily interoperable and usable outside of the OCaml ecosystem

Arguments about escaping are irrelevant in my opinion (we have full control over the opam format, so I don't see how moving to sexps would solve anything), and the original, main incentive -- having a dependency-less parser -- is fixed by a stand-alone file format lexer-parser (#2695, remains to be packaged separately).

Matters of readability also are open to wild discussions, but I would retain:

the opam format is custom, but in turn has more specific structures, which can also improve readability and allow to catch some errors earlier
as shown in the PRs listed above, we have room for improving the current format

Also, we would need a transition period, but everyone agrees that mixing two formats in the long term is ruled out. All in all, the arguments for changing may weight slightly more at this point. But, given that the change would be quite large, and, more importantly, that we are already in a release schedule -- and that 1.2.2 is starting to show signs of age -- I don't think the incentive is strong enough to do this now, while there is so many important features that everyone is waiting for.

Of course, with a stand-alone parser provided on the opam side, anyone would be welcome to provide alternate formats and bijective conversions if they feel like it.

AltGr commented 7 years ago

A further note about improving the current format and the PRs quoted above -- #2700 and #2702 actually kind of go in opposite directions:

2700 makes the format less ambiguous and simpler, at the cost of added verbosity (when printing, anyway)
2702 reduces verbosity, at the cost of more ambiguity (i find the overall layout nicer, but why quote versions and not package names ?)

So I might question persons in favour of both ;)

Another topic that arose in this thread -- related to the quoted versions - is variables vs. strings quotations (i.e. currently foo is the variable and "foo" the string). Proposals in this area are welcome, the problem being that I don't think we could accept a change where a valid file before could remain a valid file but with a different meaning after the change. The actual case where versions and variables conflict is only in filtered version constraints, which is a new addition, though, so that might be worked around.

ocaml / opam

Move OPAM's data formats to sexps #2682

2695 (merged) -- remove dependencies of the lexer, parser and printer. Remains to repackage them.

2700 (proposal) -- remove the "cleanup" pass when printing, e.g. print `build: [[make]]` rather than `build: make`. More verbose but with a simpler semantics.

2702 (proposal) -- unquote package names (`depends: foo` rather than `depends: "foo"`)

2700 makes the format less ambiguous and simpler, at the cost of added verbosity (when printing, anyway)

2702 reduces verbosity, at the cost of more ambiguity (i find the overall layout nicer, but why quote versions and not package names ?)

ocaml / opam

Move OPAM's data formats to sexps #2682

2695 (merged) -- remove dependencies of the lexer, parser and printer. Remains to repackage them.

2700 (proposal) -- remove the "cleanup" pass when printing, e.g. print build: [[make]] rather than build: make. More verbose but with a simpler semantics.

2702 (proposal) -- unquote package names (depends: foo rather than depends: "foo")

2700 makes the format less ambiguous and simpler, at the cost of added verbosity (when printing, anyway)

2702 reduces verbosity, at the cost of more ambiguity (i find the overall layout nicer, but why quote versions and not package names ?)

2700 (proposal) -- remove the "cleanup" pass when printing, e.g. print `build: [[make]]` rather than `build: make`. More verbose but with a simpler semantics.

2702 (proposal) -- unquote package names (`depends: foo` rather than `depends: "foo"`)