Package manager - Githubissues

ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language

http://www.ponylang.io

BSD 2-Clause "Simplified" License

5.72k stars 415 forks source link

Package manager #247

Closed kamilchm closed 7 years ago

kamilchm commented 9 years ago

I saw a presentation from Big Techday 8 http://www.techcast.com/events/bigtechday8/pranner-1450/. Great one BTW ;) You said you wish to use someones package manager and you're looking for suggestions. You need to try http://nixos.org/nix/. You can't do it better :) See example setup for Golang:

sylvanc commented 9 years ago

Nix looks very interesting. As far as I can tell, it doesn't support FreeBSD or Windows. Do you happen to know if support for those is planned?

We've also been looking at npm, but it seems to be more Javascript specific than we had hoped. Perhaps an npm expert can comment?

Another option we've considered is building package management into the compiler, with use statements. For example:

use "github:ponylang/reactive-streams/2"

The compiler could use the github API to look for the most recent release of ponylang/reactive-streams with a major version of 2 (using semantic versioning).

If anybody has further thoughts on this, please comment.

kamilchm commented 9 years ago

I think that Windows support for Nix is more like dropped https://nixos.org/wiki/Nix_on_Windows than planned :( I'm only a happy user of Nix, so maybe it'll be good to ask the question to its developers?

There's also http://0install.net/0compile.html, but I didn't use it much.

But perhaps a better solution would be to build ponyc specific tool, eg. Nimble doesn't have a lot of code, but have all needed functionalities https://github.com/nim-lang/nimble

cquinn commented 9 years ago

I am interested in helping. What are you looking for in a package manager?

Something like Go's packages and go get to work with source packages?

Or something more like Rust's cargo that can manage binary units?

And, do you have an idea for a unit of code that might be larger than a package?

cquinn commented 9 years ago

I have written up a rough proposal for package management for Pony available in my ponyc fork.

First question: is this a reasonable way to share proposals? This does allow detailed commenting if I submit a PR. Second question: what do you think of the directory where I placed it?

Third question: what do you think of the proposal in general?

sylvanc commented 9 years ago

This is really interesting stuff, thanks for writing it up!

Yes, this seems like a pretty good way to share proposals.
A proposals directory seems very natural for putting proposals in.
So...

I think you're right that a source package system is what's needed. At worst, that's a decision that can be revisited later (specifically when we have a nice incremental compilation story regarding selector colouring).

I agree that Go is the closest package system to Pony. Not the same, obviously, but close.

I recently added a pony_packages mechanism to the compiler, based very close on the way nodejs works. When a use command is encountered:

If the path is absolute, that's where we look. Done.
Try a path relative to the package that's doing the use. If we find one, we're done.
Relative to the using package, look in ../pony_packages, then ../../pony_packages and so on until we reach the file system root (i.e. /pony_packages). If we find one, we're done.
Repeat step 3, but relative to the compiler target rather than the using package. That is, relative to the program (or library) directory being compiled.

Your proposal with a project.yaml file and a pony tool clearly works, and I think works well. So the rest of my comments are musings... possibly crazy stuff, so please everyone let me know what you think.

I'm wondering if we can do without the project.yaml file? The use command actually takes a URI, not a string. When no scheme is given, it defaults to package: and uses the search algo I outlined above. But, for example, you can also do use "lib:crypto" to link against libcrypto.

What if we extended that, and allowed, just for example, something like use "github:cquinn/awesome-package". The compiler would look for cquinn/awesome-package as normal, but it (or a wrapping tool as you describe - and I suspect a wrapping tool is the right thing) could also be told to "update dependencies", in which case it would use the github API to look for the most recent release of cquinn/awesome-package and download it.

Then, we could provide semantic versioning in the URI. For example: use "github:cquinn/awesome-package/2" would look for the most recent release with a major version number of 2, or use "github:cquinn/awesome-package/2.1" would look for the most recent release with a major of 2 and a minor of 1 (but any patch level).

Other URI schemes could be introduced for other repository types, using the available API for that repo type in the background.

The advantage here, I think, is that all of the information goes in the source code, rather than in a metadata file. A package with specific major version dependencies can express them in source code, and vendoring can allow multiple (semantic) versions of a package to coexist in a program.

The disadvantage, I think, is that this approach expects the "unit of retrieval" to be a package. It would be possible to distribute bundles this way, but the source code might be a bit funny, something like:

use "github:cquinn/awesome-package/2.1" // this fetches the whole bundle, plus uses the named package
use "cquinn/awesome-package/2.1/something-else-in-the-bundle" // hmm, awkward

On the other hand, bundles could be expressed by having separate packages with separate units of retrieval that reference each other in their source code (since cycles in use commands are allowed in Pony).

Thoughts?

sblessing commented 9 years ago

This is great stuff. I think its a good plan to try and stick to a package management system without external configuration files. Seeing in the source code what the dependencies are is, in my opinion, the thing todo and simplifies how Pony applications are packaged & deployed. I do see the awarkness you mentioned related to the 'unit of retrieval'. Hmh, need to carefully think about this.

kamilchm commented 9 years ago

From my experience when using Go without any metadata, it can be a pain when you want to do some changes to 3rd party libs, and you don't want to wait for upstream to use it. I you want to use your fork then, you need to make changes in all your imports, and when it reaches the upstream, you need to change back your imports again. So I use https://github.com/Masterminds/glide#features which supports aliasing packages. It would be nice to support forked repos in pony too.

cquinn commented 9 years ago

Thanks for the feedback—I'm very glad I can contribute. It seems that much is agreed upon, and there are only a couple areas still needing discussion. And I agree completely with @kamilchm regarding working with 3rd party libs in Go: we run into that same problem with forked repos on Github, or even repos mirrored in our local Git require special treatment.

First, I'd like to point out that over the years working on developer tools I've learned that a good tool chain needs to enable and promote a hermetic build environment. That is, users of the tools in all situations need to be able to create a build environment that doesn't allow extraneous inputs to affect the results of builds. The same (source) inputs should always result in the same (binary) outputs when using the same tools. And this should hold on any machine, in any role in any place.

This means that for packages, the identifiers used must be decoupled from any environment-specific name. Environment here meaning the specific development machine or the local network that it resides. This is crucial to allow source code written on one machine in one environment to be shared with the world and built anywhere else. Without this constraint, source files would have to be edited before they could be built in a new environment.

This requirement then precludes the use of absolute paths: how can a path that is specific to one machine be expected to work universally? Even a system path on a given OS might be commonly standard, but it won't be universal and could easily vary in hermetic build environments.

But It is also important for the import identifiers to have a global uniqueness to them to ensure that the sources are complete, and their dependencies can be accurately identified and retrieved. Like Java with the reverse domain package names, or Go with the SCM paths. (host/user/repo).

So what we want in the source files for each import is an identifier that uniquely describes the package globally so that it can be retrieved without any supplemental information, but is independent from the machine and local network environment where it was used.

I think we can also break down the imported packages into three categories:

Pony system libraries ("net/http")
Project local libraries ("../types")
External libraries ("github.com/cquinn/merkle")

For the first group, the short package identifier that Pony already supports will work fine. For the second group, relative path identifiers within a project are great—this leverages the fact that packages are bundled together and should be built together. For the third group, I think Golang style host/user/repo is reasonable—that identifier is well accepted but extensible, and yet is decoupled from the local environment.

Regarding packages and and bundles or base package, I can see both sides of this. I think we can work through some examples to see if we can come up with a system where the root package in a project can serve as the representative identifier for the whole project/bundle.

To illustrate my view on how the process could work, here's a sample flow:

The fetcher tool invokes ponyc to ask it for a complete summary of packages referenced in the use commands in the project. This would normally skip system and relative packages, and just get external packages.
The fetcher then fetches the external package bundles using default lookup of global names. The fetched package bundles are then placed in a local Pony library path, similar to how go get fetches packages into the GOPATH.
The fetcher creates the project.yaml file to record the locations and versions of the bundles fetched.
When the project.yaml file exists, the fetcher honors the bundle locations and versions listed there.
The developer optionally makes edits to the project.yaml file to choose alternate locations or versions of packages, and could adjust this and rerun the fetch operation any time.
When the compiler runs, it looks in the three locations: 1) system packages, 2) project relative packages, and 3) external packages saved locally by the fetcher.

The tool for Go that kamilchm mentioned has a similar workflow for solving this problem in Go, and here is a nice writeup for that.

jemc commented 9 years ago

@cquinn - I think you make a lot of good points, but I'd like to interject a few things to supplement.

This requirement then precludes the use of absolute paths: how can a path that is specific to one machine be expected to work universally? Even a system path on a given OS might be commonly standard, but it won't be universal and could easily vary in hermetic build environments.

I think absolute paths do have their place in a project.yaml. For example, in a continuous build environment I might want to manually or use some other tool to fetch the packages I want, put their absolute paths in the project.yaml, then have those paths be respected by the "native" dependency fetcher. By precluding absolute paths, you would make it harder for users to build their own toolchains and workflows (given enough users, there will always be workflows you didn't imagine or plan for).

Regarding packages and and bundles or base package, I can see both sides of this. I think we can work through some examples to see if we can come up with a system where the root package in a project can serve as the representative identifier for the whole project/bundle.

I'm working on a package now (as I am learning pony) that will have one main package intended for users, with a few sub-packages at sub-directories inside of it that are intended to logically separate concerns, and could conceivably be used separately by a user, although this is not that likely. I'm not sure what the best pattern will be here for packaging/bundling the result, but I'm interested in engaging this conversation as I go.

cquinn commented 9 years ago

You are right @jemc regarding absolute paths in a project.yaml, and I didn't mean to imply that restriction there. I'm having a hard time describing all of what I am thinking in a clear and complete way in just English without pictures or code :)

To be clear: keeping the package identifiers universal is crucial in the Pony source, and recommended in project.yaml files that are published or shared. But when a Pony project is being worked on locally, all sort of identifiers should be allowed in the local project.yaml to enable tools as you say, or aliases to use library forks or cached copies. The Glide tool for Go has that feature, to solve these headaches that Go developers have run into over the last couple of years of using shared libraries.

sylvanc commented 9 years ago

So having gone over this with @andymcn and @sblessing , I think this sounds really good. @cquinn , are you interested in working on this? I'm happy to add any compiler support that's needed.

andymcn commented 9 years ago

I thought I'd throw some of my thoughts into this.

Firstly, lots of things have been suggested here, so I may have misunderstood what various people are agreeing to. I shall try to be explicit.

I think that grouping packages into bundles is a very good idea. For clarity I'm using the following terminology:

package - the unit of compilation. Basically the files in a directory (and not in any sub-directories).
bundle - the unit of distribution and versioning. One or more packages, a directory tree.
project - the program you're actually trying to build. A directory tree.

One thing that wasn't clear to me is whether "external" libraries (ie ones not in the current project directory tree) have to be available from github (or where-ever) or could be only on the local system. The latter allows for libraries that can be shared between multiple projects without having to publish them to a proper repo.

In general I'm very much against config files external to the source code. We've contemplated adding them several times during the history of Pony and always decided against it. For me they're too much of a step towards the evil that is complex make files.

So I'm in favour of fully specifying where to get a bundle within the source code. @sylvanc pointed out this could get ugly in the code, but I think that can be (largely) fixed by specifying bundles and packages separately. Example (using existing syntax):

use foolib = "github:cquinn/foolib/2.1"
use "foolib/awesome_package"
use "foolib/something/else"

The name "foolib" then must be unique, including not clashing with relative packages, but only within that source file. Some other name could be used in a different file if necessary.

We could also modify that slightly so that it's explicit that we're referring to a bundle, rather than a relative package. Prefixing bundle names with # or something would achieve that.

One disadvantage of this approach is that every source file must specify where the bundles come from. So every file that uses said bundle must specify "github:cquinn/foolib/2.1". This can be a pain when we want to migrate to version 2.2 and it's easy to forget to update some references.

A possible solution to this is to make bundle imports visible throughout the importing package. For example:

in one file in the package:

use foolib = "github:cquinn/foolib/2.1"
use "foolib/awesome_package"
use "foolib/something/else"

in another file in the same package

use "foolib/awesome_package"

If we did that we'd probably want to change the syntax a bit, but I think the idea is sound.

If such a "bundle specified in source code" approach is used then the yaml (or whatever) file the fetcher generates / uses is just for caching purposes and to allow substituting, which I think is good.

Substituting bundles

We definitely want to allow substituting one bundle for another throughout the whole project. For example I may be using "github:cquinn/foolib/2.1", but want to use my own local version containing bugfixes or whatever.

Having a per-project yaml file to specify substitutions during development seems sensible to me. Then update the references in your source before distributing.

It might be nice to also allow substitutions for some packages / bundles rather than just the whole project. For example, I'm using foolib and barlib, barlib also uses foolib. Foolib has some bugs, so I want to use my own local version with fixes. But barlib relies on the bugs in foolib, so it needs to use the official github version.

I'm not sure how to handle this situation.

Multiple exes

One complication is we generally want each project to have more than one executable. Primarily this is due to unit tests, but also a big project may want to have multiple programs anyway.

The way Pony works, each package produces one executable (or none). But if you're going to be specifying bundle substitutions they go in the project yaml file. The problem is, how does the compiler know where to look for that file? With multiple exes in a project the project directory is not necessarily the one the compiler knows about.

One possibility would be to specify on the command line both the project directory and the package to build relative to that directory. Eg, to build package bar in project foo:

pony myprojects/projectfoo bar

Multiple versions

We definitely want to be able to use multiple versions of a bundle within a single project. For example I may want to use foolib version 2.1 and also barlib, but barlib needs foolib version 1.5.

As long as bundles are always specified with a version (in both Pony source and the yaml file) I don't think this is a problem.

Multiple repos

Suppose I'm working on some big machine at my company / University. The sys admins have set up a global Pony fetch store somewhere on the machine and fetched lots of useful bundles into it. However I don't have write access to that.

When I run Pony fetch I want it to know about the read-only system store and my own writable store too. It would be good if we could handle this situation.

Similarly it would be nice to handle multiple local library stores.

cquinn commented 9 years ago

Yes, @sylvanc, I am definitely interested in helping. And you make some good points @andymcn.

Here is a quick reply, before revising my proposal doc to match where I think we are.

External bundles will need to be identified somehow globally so that they can be fetched, but then can be compiled against locally with that global identifier. Go does this with 'go get' fetching bundles into $GOPATH, and then 'go build' reading them from there. So the global identifier becomes a relative path segment under $GOPATH. A developer could use alternate techniques to fetch or share these bundles, and as long as they ended up in the right location locally, they should be found.
I also like keeping as much of the package specification in Pony source as possible, and avoiding a special external non-Pony files.
I also like having a distinct bundle import and a bundle identifier that is at least package global.
I think a fetcher could scan Pony source just for these bundle imports and figure out what to fetch for the project.
What if bundle imports were distinguished by always having a scheme prefix, and package imports no longer having that? In your example, foolib would be interpreted as a bundle because its arg has the github: scheme, and the other foolib/ imports are packages because they don't. (and maybe also because they have a <bundle>/ prefix)
Regarding mixing versions of bundles in a project: Is that possible or desirable to do in the Pony / LLVM environment? In the Go world, everything is compiled from source, and so multiple versions of the same packages would collide. I think part of Rob Pike's (and Google dev in general) philosophy is to always work with libraries near latest, and compile projects and all libs together from source since version compatibility is easier with source than binary. I agree with this approach, and suggest that Pony not allow multiple versions of a bundle. In your example, if barlib relied on bugs in foolib the project would have to either put up with the buggy foolib or make fixes to (possibly forked) barlib to work with the fixed foolib. I haven't found this to be a problem in the open source Go libraries I use.
Regarding project scoped files, like the project.yaml: Would it be possible for the compiler to walk up the directory hierarchy and stop when it finds a project.yaml and treat that dir as the project root?
And what do you think of the idea of having a project.pony instead? It could contain just bundle imports that would be at project-scope. And maybe also package.pony files in each package for package-scope bundles? We would just need to figure out precedence, but I guess the inner bundle spec would shadow the outer one. But by convention, it would be most common to put all bundle imports into the project.pony and then just package imports in all remaining Pony source referencing those bundles.
Regarding multiple repos: that does sound like a reasonable use case, but I am not sure I see where the trouble is. Wouldn't the fetcher just fetch packages from the readonly repo into your local bundle store, and then you'd compile against that? Maybe we just need to clarify how we want to manage that local bundle store.

andymcn commented 9 years ago

Excellent, we seem to be largely in agreement about most things.

Here are some replies to your responses.

Regarding distinguishing imports: Yes we can tell a use statement is a bundle import because it starts with github: or whatever. We can also have something like bundle: for specifying local bundles. I was talking more about how to tell that "foolib" is a bundle alias rather than a relative package name. A prefix is certainly one way to do this. But really, exactly how we do that doesn't affect the fetcher, so it doesn't matter exactly at this point.
Regarding mixing versions of bundles in a project: Yes this currently is possible with Pony. There is no global namespace for types or packages, so there are no collision, everything is imported and used by whatever needs it. You can even have 2 different versions of a single package used within the same source file if you want to, with a use alias. And I think it is desirable too. Saying "always use the latest version" is all well and good if it's your code that's using it. But if you want to use foolib, and that needs an old version of barlib, then what do you do? Yes you have the source to foolib, but you don't want to have to rewrite other people's libraries because they haven't done an update yet. And if the API of barlib has changed significantly it maybe virtually impossible to fix foolib without completely rewriting it.
Regarding project scoped files, like the project.yaml: Yes the compiler could walk up the directory tree looking for the file. However, that means you always have to have such a file when you run the compiler. So when you start a new project, and don't have any dependencies on anything other than the standard library yet, you have to do something to generate the project file before you can compile. Of course if the compiler always runs the fetcher first this probably isn't a problem.
Regarding having a project.pony file: That's a possibility, although I don't see how that would work with transitive dependencies from imported bundles. How would you substitute a different bundle (eg a local one with bug fixes)? I'd be very wary of having project and package level config files (even if they're actually pony files). Pony currently has no shadowing anywhere. We've worked hard to achieve this because we believe shadowing naturally leads to bugs. It would be a shame to add it now in config. Having just a single project level file is an option, if we can make it work sensibly.
Regarding multiple repos: I think I may not have explained myself very well. I've been assuming that the thing the fetcher gets bundles from (a bundle repo) and the place it puts bundles for the compiler to use (a bundle store) were different things. This may not be the case of course. If it is then feel free to give them better names than I'm using. I was thinking that the thing the sys admins set up was a bundle store, not a repo. The compiler would then build some bundles straight from there and some bundles from the user's own store. This would mean that the fetcher wouldn't have to copy files from the system store to the user's store, thereby saving disk space on a machine where the user is likely to have a quota. The fetcher would have to check both the system and user's own stores before downloading the bundle from github, which is the point I was trying to make. It could of course be argued that both disk space and network bandwidth are cheap enough it's just not worth bothering with such a set up. Big machine sys admins may well disagree.

jemc commented 9 years ago

Just to chime in quickly about multiple versions of bundles being usable together - I think @andymcn is right that this is desirable and the package manager should not preclude this possibility.

In Go, such a concept may not make sense or be practical but with Pony's localized namespacing, it's a very attractive feature. (And, if I wanted to be using Go - I would!)

As @andymcn said, just because you are always working with all libraries near their latest versions doesn't mean the near-latest versions of the libraries you use are using all of the same near-latest versions of shared dependency libraries that you are. Avoiding this restriction allows for more confidence and agility when upgrading a dependency version of your code, as you only have to worry about how it affects your code, not code that you depend on or depends on you.

cquinn commented 9 years ago

OK, I am convinced that allowing multiple versions of libraries is a fine thing, especially now that I understand more how Pony handles their naming. I didn't realize that the aliases were really changing the code produced.

As for the project and package files, I'm open to different ideas here around format, content, scoping, etc. It just seems to me that it would be difficult to manage a project and package build without some way to define metadata at the package and project level. I think this is something we should explore more fully.

@andymcn I do like your distinction of the bundle repo and bundle store. I was misunderstanding your example—I didn't see that the sysadmin would setup a bundle store directly. I did assume that would be a bundle repo, and all users would fetch bundles into their own stores. I really don't see how extra copies of source code on a machine would have any significant impact with storage so cheap. But, it probably would be fairly straight forward to provide ways to include pre-fetched bundles into a local bundle store if we want to optimize that way.

One thing that I think we haven't talked about much is how the tools should work together. It sounds like you are thinking that ponyc would be the main entrypoint tool for the user, and it would call out to other tools like the fetcher.

I tend to think that a model with a general purpose developer tool that drives the more specific tools is more flexible and extensible. Like Go with the go tool that can invoke go get, go build, go test, go run, go fmt , go install, etc. Or Rust with Cargo. This approach allows the compiler to stay focused on compiling, and the tools around it to build out a larger developer kit. It might also make it easier to build all of the other tools in Pony itself while ponyc.

Imagine a pony tool that could perform pony fetch, pony build, pony run, etc. Many of these commands would invoke ponyc, but possibly do other things as well.

andymcn commented 9 years ago

Here's my new proposal, taking into account everything that's been said recently. Please let me know if any of this is unclear or seems like a bad idea.

Proposal

Bundle specification

The specification of which bundles to import is specified in the Pony code via use commands. Bundles have to be given aliases, which are then used when importing the specific packages within the bundle.

Bundles can be sourced remotely (remote external bundles), eg from github:

use foo = "github:cquinn/foolib/2.1"

or locally (local external bundles):

use foo = "bundle:foolib/2.1"

The scope of the bundle aliased specified in these commands is not project wide. In particular, 2 different bundles do not have to use the same alias for the same third bundle.

Packages to use can then be specified with other use commands:

use "foo/utils"

The "foo" in such a use command could only be (in order):

A previously defined bundle alias.
A local package name.
A standard library package.

Bundle versioning

Both remote and local bundles use semantic versioning. The most recent matching version is fetched. For example, if the existing versions are:

2.3.0
2.3.1
2.3.2
2.4.0
2.4.1
2.4.2
2.4.3

If you ask for version 2 you'll get 2.4.3. If you ask for 2.3 you'll get 2.3.2. If you ask for 2.3.1 you'll get 2.3.1. If no version is specified the latest is used (2.4.3).

The fetcher performs this version check, fetching the appropriate version.

Note that a single project (or even a single source file) may require multiple versions of the same bundle.

Fetching bundles

The fetcher is invoked on the project. It walks all source files finding bundle use commands, builds a list of all required bundles and fetches them. It then does the same for the fetched bundles and keeps going until all transitively required bundles are fetched (or there's a failure).

The fetcher ignores the aliases the use commands provide for bundles, they are only used by the compiler.

Remotely sourced bundles are fetched into the user's bundle store. Locally sourced external bundles could be copied into the user's bundle store, but I think it makes more sense to just use them from where-ever they are on the local system.

The list of required bundles is stored in a project wide file. This can be edited manually to specify bundle substitutions. If this file doesn't exist then the fetcher creates it.

The fetcher reads this file before fetching anything and honours any substitutions it specifies. It checks for newer versions of bundles every time it is run, but not for substituted bundles which need to be updated manually.

This file is only used in projects and is not required for external bundles and is not distributed with bundles. It is only used for specifying where to find the bundle referenced in the source code, including substitutions and versions, plus possibly for documentation and caching purposes.

Bundle stores will have their own files specifying what bundles they contain, including versions, checksums, etc. The format of these files and the information stored in them is entirely an implementation detail of the fetcher.

Building

The compiler processes the bundle list file and uses it to locate the bundles it needs. Any required bundles that are not present are treated as errors, the compiler never tries to fetch anything itself.

The compiler does not try to edit the bundle file.

If there are no external bundles used then the compiler doesn't need the bundle file and won't complain if it doesn't exist. If any external bundles are used then the bundle file must exist or the compiler throws an error.

This allows for the compiler to search up its directory tree to find the bundle file (in the project's root directory) regardless of where the compilation starts. But also allows simple projects (that use no external bundles) not to need the file or ever have to run the fetcher.

Invoking

There is a single top level tool, called pony, that the user calls with various arguments to perform all operations. This tool will invoke the fetcher and compiler (and possibly others) that are separate executables.

There is no technical reason we couldn't have it so the compiler does all of this, but it seems reasonable to split it.

Bundle file format

The bundle file is autogenerated, but needs to be human readable and editable.

It could be a Pony file, but I don't think that's sensible. It only contains bundle information, not general code, and the bundle substitution information won't fit into existing Pony syntax.

So instead we should use an existing standard format. Json and yaml are both sensible options.

Here's a sample bundle list file in pseudo format. The foolib entry is purely autogenerated. The barlib entry has been manually changed to substitute a local version of the bundle, the substitute keyword is used so the fetcher knows not to try to update the version of the bundle used. The awesomelib entry is an autogenerated entry for a local bundle.

bundle github:cquinn/foolib/2.1
  store /home/andy/ponystore/github_cquinn/foolib/2.1.3/
bundle github:cquinn/barlib/3
  substitute /home/andy/mylibs/mybar/0.1.4/
bundle bundle:awesomelib
  store /home/andy/mylibs/awesomelib/0.0.0/

We could also store other project related information in the bundle list file. For example, this would stop us having to specify the output directory on the command line every time the compiler was invoked. I'm not sure whether this is a good idea or not.

Things I've skipped over

The fetcher needs to know where to look for local bundles and where to copy remote bundles to. I'm assuming this will be done with some combination of environment variables, command line options and standard relative paths, much as the compiler currently does things. The compiler won't need any such mechanism because the bundle file will tell it where to find bundles.
The scope of the bundle aliases created by use commands could be file, package or bundle wide. However, since this doesn't affect the fetcher and is purely a compiler issue I'm not considering it important for this discussion.
I've completely ignored the standard library. At the moment it isn't versioned, however it really should be. If we don't version it then we won't be able to change it later without annoying existing users. It could be handled just as a special case of a local bundle, with an implicit use command. Or we might want to do something different.
I haven't mentioned the question I previously raised of having multiple local bundle stores, possibly some being read-only. I don't think this is actually very useful for I'm ignoring it for now. However, nothing in this proposal should prevent it being added later.
There is no mechanism here for substituting one bundle for another in only part of a project. Again I'm not sure whether that's actually a useful ability. Also it is possible to achieve such an affect by editing all the relevant use commands in the source code.

jemc commented 9 years ago

I've completely ignored the standard library. At the moment it isn't versioned, however it really should be. If we don't version it then we won't be able to change it later without annoying existing users. It could be handled just as a special case of a local bundle, with an implicit use command. Or we might want to do something different.

Personally, I think the standard library should be (minimally) separately revision-controlled in its own repository and (ideally) not treated specially in any way by the compiler or the fetcher (except for possibly already existing in a bundle store when pony is installed). Many languages start out with a standard library in an attempt to be "batteries-included" but these often become cumbersome later in the life of the language when frustrated users with different kinds of problems than those imagined by the standard library authors often end up creating alternative libraries that solve their type of problem better. I think it definitely makes sense for the language to curate a set of bundles/packages to make the language more useful out-of-the-box, especially early in the life of a language, but I think it's important that they be maintained as just-another-third-party-package, on the same playing field as user-created third-party packages.

cquinn commented 9 years ago

I really like your proposal @andymcn, and I don't see anything that looks wrong at all. I would like to add a few suggestions here and there. Or at least ask some questions.

Regarding versions: what about libraries that aren't semver, or even if the library is in general, what if you want to pull a version by tag or hash? I suggest we just allow that notation somehow and the version identifier is then treated as opaque and used literally, and not processed like semver.
The bundle file should be some kind of simple text format that is easy to parse for all the tools: pony, fetcher, ponyc. But a standard format would be nice too. YAML looks nice, but is hard to parse. JSON is easy to parse, but tedious for humans. Rust Cargo uses TOML which is pretty good at both, but not particularly elegant. Not that we have to decide this right away, but something to chew on.

Comments on the skipped things:

Yes, I think environment variables plus config file and/or command line overrides should be fine. Maybe the pony tool.
We will have to figure out the bundle alias scope. I think package scope was suggested earlier, and that seems the best balance to me.
It seems like some core of the standard library will have to be delivered with the compiler, as I assume some generated code depends on helper code in the library. Maybe that is or could be limited to the builtin package, so then the rest of the stdlib could be sem versioned and fetched like all other bundles. Also as @jemc suggests, keeping the standard library componentized will simplify evolution and alternate implementations.

andymcn commented 9 years ago

I think we should encourage libraries using semantic versioning as much as possible. But you're right, we need a way to handle those that aren't. So yes we should have some notation to distinguish between a literal version and a semver that will be interpreted accordingly.

Relatedly we should consider how to restrict the provided URIs to avoid injection attacks. Currently we outlaw a specific set of characters that can be used in bad ways. This is a rather crude and arbitrary way of doing things that might cause problems with some legitimate repos.

The bundle file should definitely be a standard format. I currently learn towards JSON, but there are other sensible options. Whatever format we pick it would be good to have a single implementation of a parser that could be used by all the tools. We currently have shared code in the runtime library, which the compiler is statically linked to, for things like memory pool allocation and directory access functions. It might be worth do something similar for shared tools code.

Bundle aliases being package scope makes sense to me as well. My only worry is that other use command aliases have file scope and that might confuse people. But this isn't a major issue.

Currently the standard library code that is treated specially by the compiler is limited to the builtin package. In fact that's pretty much the definition of builtin, it's types that the compiler needs to know about or that need magic implementations provided by the compiler. Properly versioning the rest like any other bundle in entirely sensible.

jemc commented 9 years ago

By the way, I've recently found with the package I'm working on that keeping the source files at the top level of the repository can start to be quite unruly as the number of source files increases, so I decided to move the package source to a subdirectory of the repository (as the name of the package - in this case, zmq - so that the binary has the right name).

It would be nice if there were some way to have this not cause headaches for the future package manager.

pyrossh commented 9 years ago

I think the pony_packages directory is better even though i tried it and it didn't work so I resorted to using a workspace like go right now. Has anyone given any thought about go 1.5 vendor package system which is going to be the official package manager for go. Using github urls and releases will be much easier and then we won't need to use a registry and using .yaml/.json files also might be too much into the future. I Think the best way is to go the go way without one as package management isn't very easy to manage.

hibnico commented 8 years ago

Sorry for arriving late in the discussion, I hope I'll not disrupt too much.

Very interesting discussion, lots of great ideas, but one is bugging me very much: the way a dependency is declared in a source file.

I have worked with dependency management in the Java world for quite some time, mainly on Apache Ivy which goals is to manage dependency with every Java library of the world, so it integrated with many different system, from old school lib folder, to OSGi, Maven and even Eclipse plugins. Basically the goal of Ivy is to manage all the crappy dependency declaration there are out there on the net. What I have learn is that the net storage highly unreliable, and there are mistakes in the dependency declarations.

Since a package of source files is the primary unit of what is being shared between projects, we shouldn't have to modify the imported sources if a github project is being renamed.

Another important notion I learned from OSGi is the difference between the API and the implementation in dependency management. A package of source files, via its "use" commands, should only require an API, not a specific implementation. That way, anybody can reuse that package of source files, provided any implementation of the API its depends on. This decoupling make dependency management a more flexible to the end user, which can then easily work around some buggy transitive dependency, without having to touch to the imported sources.

The drawback with only declaring only dependency on the API within the source files, it that specification is too wide to have a reproducible build. A dependency management tool would have to find a bundle which implement the required API, and over time, many versions can exists. So we still need a way to declare dependencies to specific implementations. But this only for the build of the project, which per se is not reused elsewhere, contrary to the source files.

So basically two kind of dependency declaration are required. In the OSGi world they choose the fully qualified names of the Java packages for the API, and an arbitrary name of the bundle for the implementation. For instance the web servers Jetty and Tomcat are declaring implementing the javax.servlet suff, but there absolute coordinates are respectively org.eclipse.jetty and org.apache.tomcat. That way, I can build my own web framework (because there are enough out there!), depends on javax.servlet, and have the users of my framework choose the actual implementation of web server they want to use.

It can be very similar in Pony, the primary unit which can be shared being the package. That would mean no protocol in the "use" command. But a dependency declaration file would be needed at build time.

There is a lot more to discuss about the consequence of this idea, separating API and implementation dependencies declaration, how it is fully implemented. But I'll stop there, to know if you agree on the principle.

kamilchm commented 8 years ago

One more thing I've encountered recently is elm package manager and its strict rules for versioning https://twitter.com/doppioslash/status/676845572100333568 I may be worth to dig into it ...

jonas-l commented 8 years ago

Strict versioning would be really great! To take it even further, package manager could ensure that not only interface is backwards compatible, but a contract as a whole by running tests of previous minor version. This of course implies that tests are published together with the module.

hibnico commented 8 years ago

Adding semantic to the version is a great idea. But it doesn't mix up really nicely with how we manage versions usually in a project, especially when we are adding all these extra qualifiers to the version, like a build number, a timestamp, a svn revision, or when we are managing and releasing several branches in parallel.

But it could be done quite nicely I think if the API coordinates is separated from the implementation ones of a bundle. In the OSGi model they did try to enforce semantic on both level, but I saw many use cases where people were stuck with the enforced 3 numbers of the implementation version. I think that it would be better if the version of a bundle/implementation could be quite arbitrary, whereas the version of the API/package should be enforced with strong semantics. And the cherry would be a tool which do all the semantic checks between versions of the API!

rurban commented 8 years ago

I would prefer the explicit

use "git:github.com/ponylang/reactive-streams" over use "github:ponylang/reactive-streams" to easily allow other non-github repos also. with git as use schema

but a central CPAN-like repo, with a list of name => path mappings and a new use schema would scale better in the long run. use "repo:reactive-streams" or use "ext:reactive-streams"

versioning:

I don't like the version as subpath. I'd rather prefer a space seperator, and optionally support ==, <=, > ops. as subpath it's too github specific.

use "git:github.com/ponylang/reactive-streams 2" or maybe require a v prefix use "git:github.com/ponylang/reactive-streams v2" to make it clear.

TehShrike commented 8 years ago

I've been watching this topic very excitedly. Y'all seem to be considering the problem very well and I'm looking forward to seeing what you work out.

In the meantime, here's a Github-based package manager that somebody made (and was tweeted out by @ponylang today), just to note it here in this thread.

hibnico commented 8 years ago

This topic is indeed very interesting. I would like to make it move forward, I am quite excited by this awesome Pony language and a package manager is a must have. But before doing so, I would like to discuss the principle of putting bundle dependencies in the source files.

I tried to have a step back, starting from a cleaner state, from basic user stories about using a package manager. So I would not be pushing ideas and tools I am just used too.

One particular user story that doesn't fit well with having bundle dependencies declared in the source files is about having a local proxy of the remote repositories of bundles.

A package manager will have to ensure that a build is always stable and reproductible in time. But it cannot ensure that the online resources will still be there over time. It is of the responsibility of the end users to choose reliable enough remote repositories. And some of the users may choose to have a proxy deployed in house, so the bundles actually fetched from the net are closer to home, and most probably backed up.

To have a such proxy working well, there need to be a distinction between what is being required and how it should be fetched. By allowing URIs in a use command, Pony is preventing to be able to distinguish between both. If a Pony project depends on some remote Pony source which are themselves requiring an URI, the end user would have to locally either modify the fetched sources, or tell Ponyc that the required URI should be transformed in another one. This is how it works for Go with Glide. And by doing so, setting up a proxy won't be impossible, with a lot of aliasing, but will most probably be quite painful.

So I advocate that there should be a distinction between what and how, and that Pony should enforce that distinction. Pony can already do the former, with its "use" command together with a simple package name, package name to be found in the standard library, or the ones in the path of the compiler. The how would be then managed externally, by another tool, or just manually, or just like pony-stable is doing.

Then there are consequences of what is what. As some Pony sources can "use" several different versions of the same package (which is great), the what should incorporate that notion of version. This is a great topic where we would also have to discuss about semantic versioning.

This separation of concerns will modify one thing compared to the current way of defining dependencies though: the control of what exact package implementation is being bound by a "use". I think it is a good thing, but it can be seen as a drawback, it will be a little painful on already painful (expected to be rare) cases.

Currently package use is obvious. The used package is either around in the project, or within the standard library, or directly pointed by a path. With the indirection I would like to introduce, it won't be that obvious. For 99% of the use cases, it would still be quite obvious, I expect we won't have much package name collisions or the need the import two different versions of a package.

It's harder when a project need two versions of the same package. The end user will have, either way, to get the two packages into two different locations. But he would have then to be able to specify that in two of its own packages their "use" are actually binding with two different versions. To be more clear with an exemple, he has to have somewhere two folders for a package foo_1.2 and a package foo_2.3, and in the sources of its own package bar there would be a use foo@[1.0, 2.0)", and in the sources of its own package baz there would be a use "foo@[2.0,3.0)". Here just by looking at the source, we cannot know which exact package will be used. The list of packages in the path of Ponyc will only tell.

To dig further in a very problematic case, let's say that additionally to the previous exemple, in the transitive dependencies, there is a package which has use "foo@[1.0,3.0)". Considering the path of ponyc where should appear both versions of foo, its impossible to tell which version of foo will be actually used there. To properly resolve that kind of cases, Ponyc would have to have clear rules about how it is done, and maybe offer some hooks where these rules could be overridden.

Note that resolving such cases without the distinction between what and how would be easier within the project. But it would be a nightmare for any other project "using" this project, since the "use" would be hard coded within the source files. But again, I expect this kind of case to be very rare. I pointed it out because down the road, Ponyc would have to somehow handle them.

To get a little more concrete, I am starting to have ideas about what it would look like for the end user. I think that it would be great to a have tool dedicated to fetch the dependencies, like npm or apt-get. I would run "pony-get yaml" for instance, to be able to use a yaml parser in my project. It would download the appropriate stuff locally in my project, and update a file which describes precisely what exact version of the bundles which has been downloaded, and from where. I would share this file with the other developers so they will be able to download the exact same stuff, even a month later. I would then write my code with some simple use statement, just using the package names. Days later, after discovering a bug in the yaml parser, I will then like to use the latest version. I would run "pony-get remove yaml". It would remove from my dependencies the released version of yaml, along with its transitive dependencies. It would update the dependency descriptor shared with my coworkers accordingly. I would then run "pony-get github nlalevee pony-yaml". It would download the latest stuff from github, and put in the dependency descriptor the url of the github and the sha1 of the checkout.

I may be a little verbose here. I hope some of you read it all and am hoping for some feedbacks.

0joshuaolson1 commented 8 years ago

@TehShrike You forgot the link.

TehShrike commented 8 years ago

Oh true enough! The package manager is pony-stable.

jemc commented 8 years ago

I'm the author of pony-stable and now a member of the core Pony team. That little project was intended as a quick stopgap measure to get some basic dependency tracking in place in some of my other projects.

Soon, I'd like to get down to the work of integrating (at least one) proper package manager, using some of the great ideas in this thread. I think we can do it in a way that provides a good user experience, but is loosely coupled enough that folks can create different solutions to meet different needs, and compete for mindshare in the community (instead of a single solution imposed from above with no option for alternatives).

hibnico commented 8 years ago

By "integrating", do you mean using an existing package manager ?

0joshuaolson1 commented 8 years ago

I don't know if this adds to the discussion, but I had a reason to mention Pony in a Chapel mailing list. They don't have a package manager yet either, but someone had this to say:

Rust and Ceylon have taken a road that combines the Go way and the Python way. Both have a central, curated repository, so there are the problems of high priest bias for the content, and the separation of experimental and published content. However this solved the problem of reproducible builds without vendoring. Rust has Cargo, in the way that Go has Go and D has Dub: a tool for managing dependencies and compilation. This allows for accessing experimental repositories instead of the central published repository.

Personally I find the Python, D, Rust, and Ceylon ways of managing external packages much nicer than that of Go

They have a minimal proposal that's pretty much Cargo written in Chapel with some Nix thrown in.

renatoathaydes commented 8 years ago

It would be nice to be able to use existing tools for package management. I am a big fan of Gradle for that. With Gradle, dependencies can come pretty much from anywhere - local files, servers, GitHub etc... the hard part, resolving transitive dependencies and versions, is done by Gradle itself (you can choose different strategies for conflict resolution). I have written the Ceylon Gradle plugin, because even though Ceylon has its own dependency manager - see Herd - it didn't play well with Maven dependencies (in my opinion Ceylon made a mistake by implementing their own, inferior package manager). So I have experience doing that and could contribute by writing a Gradle plugin for Pony if there's interest. Please let me know what you think.

hibnico commented 8 years ago

Using an existing package management will probably be a nice quick win. But then it would make better sense to choose the tool which is the closer to the way Pony builds. One thing quite particular is that it builds everything form source. Since it is very new it doesn't have to be compatible to any existing packaging. Also some Pony project may need some native dependencies.

Actually behind Gradle there is Apache Ivy. And the purpose of Ivy is to manage every kind of dependencies in the Java world, even a very old style lib folder. It does a great job at it, I even have been able to make Ivy supports OSGi bundles or Eclipse updatesites. Here with Pony we have the opportunity to have a much simpler model.

So probably a better tool to reuse would be the ones from Rust or Go.

I think we can do even simpler and better, and the existing pony-stable is actually a very good starting point.

renatoathaydes commented 8 years ago

I understand your point of view, but I should mention Gradle supports native languages like C and C++ (see Gradle native ) so it seems that supporting native libs is not a problem. Ivy is not nearly as approachable as Gradle in my opinion, and less powerful. Gradle can also support OSGi dependencies (in fact, I wrote osgi-run which allows you to depend on multiple versions of the same dependency, as that's supported by OSGi).

With pony, the Gradle file could be as simple as apply plugin: 'pony'. This would trigger downloads from the default repo (just have to setup one, of course :) ) so any use 'pkg' statements in source files could be parsed and automatically downloaded (though I would rather declare all imports in the build file, and only download those, making them available for use statements in source code).

But I understand if you guys feel unwilling to associate yourselves with the JVM ecosystem.

EDIT By the way, does Pony support multiple versions of the same library in the same application in case the library is private to the modules importing it?

SeanTAllen commented 8 years ago

By the way, does Pony support multiple versions of the same library in the same application in case the library is private to the modules importing it?

It depends. Can those multiple versions co-exist in the same binary without shenanigans? Then yes. If the library does something to prevent that, then no.

SeanTAllen commented 8 years ago

At this point, the way to get what you want in a package manager is to build it.

@jemc has said he is working on a package manager. he's seen this input but in the end, he gets to build what he wants.

At Sendence, we've been using and contributing to @jemc's simple stand-in 'pony-stable' tool that has been working quite well for us.

If someone was to do Pony support for gradle and could get enough momentum behind it (this seems unlikely to me) then eventually that could be blessed as the "official" way.

I think the most important part of this is that we have a standard, single package manager and a nice ecosystem around it like Rust has with cargo.

hibnico commented 8 years ago

I understand your point of view, but I should mention Gradle supports native languages like C and C++ (see Gradle native ) so it seems that supporting native libs is not a problem. Ivy is not nearly as approachable as Gradle in my opinion, and less powerful. Gradle can also support OSGi dependencies (in fact, I wrote osgi-run which allows you to depend on multiple versions of the same dependency, as that's supported by OSGi).

With pony, the Gradle file could be as simple as apply plugin: 'pony'. This would trigger downloads from the default repo (just have to setup one, of course :) ) so any use 'pkg' statements in source files could be parsed and automatically downloaded (though I would rather declare all imports in the build file, and only download those, making them available for use statements in source code).

Note that that there are two things here: the dependency management and the build workflow. Ivy is just about dependency management, and Gradle is using it in its build workflow. So for instance Ivy is supporting OSGI in the sense that it understand the 'Import Package' primitives within a Manifest of a bundle and thus supports OBR repositories, and Gradle is supporting OSGi because it can build a proper OSGi Manifest thanks to bnd. These are two different supported features, not comparable.

About a build workflow for Pony I bet there will be many flavors of it, and Gradle may be part of the alternatives. The hard part here I think is to define what a dependency is for Pony, how should be handled transitive dependencies. Because once one has been defined by whatever tool and has been pushed for public consumption, every existing build tool for Pony will have to support it. In the Java world it has been defined a little late and we have end up with Jigsaw. In Pony there is nothing yet so it is a good opportunity to define something nice and simple.

renatoathaydes commented 8 years ago

If you keep the format of the package file simple and generic enough, it may be possible to use different engines to actually fetch the dependencies.

Perhaps Pony, the language, should have a package file specification to allow users to declare the versions of the libraries that will be used and the full paths to fetching them (or just define repositories where the dependencies are). This would be much better than adding repository/version information in source files (it wouldn't be fun to update versions of a big project and keep them consistent), and the use command would remain as it is today.

Once the dependencies are fetched by whatever engine, ponyc can do its job by simply looking under the local directory for sources - where all dependencies should have been placed (there should also be a global cache with separated, versioned libraries to avoid unnecessary downloads).

I could make it work with Gradle, and pony-stable could also work with the same file (the current json format seems rather incomplete, but it's a start - I would suggest the Cargo toml format is a nicer, more generic way to go).

renatoathaydes commented 8 years ago

One more suggestion: use organisation as part of the dependency coordinate (besides name and version). For github deps, that would just be the user under which the repo is located, for example.

This will avoid silly fights over popular project names, as we've seen in other languages, make it clear the origin of a project, and perhaps offer some protection against typo squatting.

hibnico commented 8 years ago

To make things even simpler, I'm advocating for only two kind of coordinates:

the package name (and probably at some point a version), which would be the api coordinates: it would be in the use of the Pony sources.
the location of the implementation of an api, which could be a github repository at a specific hash, or an url to download some packaged sources, or a local folder.

No need for an extra coordinate, no need for an organization. That way we avoid a lot the mess we can hit with traditional dependency management, like for instance a debate about a version scheme. For a complete argument, see my previous comments in this thread.

renatoathaydes commented 8 years ago

I've created a Gradle plugin for Pony.

It uses the same dependencies file format as pony-stable.

https://github.com/renatoathaydes/pony-gradle-plugin

Feedback welcome.

SeanTAllen commented 8 years ago

That's awesome @renatoathaydes.

jeffgran commented 8 years ago

I'm late to the party, but thought this might be of interest: https://github.com/whyrusleeping/gx -- a language-agnostic package manager system built on IPFS, which in itself is a very cool futuristic technology.

CandleCandle commented 8 years ago

https://github.com/CandleCandle/pony-maven-plugin

I've built this, there are, obviously holes in it, issues and PRs welcome. I hope that the initial documentation I've put there is enough to get started.

hardliner66 commented 7 years ago

I've built a very basic conan generator for building pony projects: https://github.com/hardliner66/conan-pony

If you want to try it out, you can find the test package here: https://github.com/hardliner66/conan-pony-test

Building with this conan generator is currently a two step process conan install --build python build.py

but I think with a bit of effort it a single command build can be achieved.

Benefits of this solution are versioning and a dedicated package registry, so searching and managing of dependecies should be better than with github repositories.

SeanTAllen commented 7 years ago

As an FYI to everyone, we've been moving forward with using pony-stable as a "package manager". It's already pretty well featured for handling dependency management. We plan on growing it over the course of time and seeing how it goes. Stable is used extensively at WallarooLabs (formerly Sendence) and has been picked up many Pony developers as their tool of choice.

Others are welcome to either:

Work on their own tool to compete
Work with us to improve pony-stable

There are many excellent ideas in this thread, we'll see how many end up in Pony tooling. Last I heard, @cquinn was working on some improvements to stable.

I'm closing this issue, but, I want to again emphasize that by closing this, I do not want to discourage anyone from building a different dependency/package management solution for Pony. If your solution needs changes to Pony itself, you can raise those issues via our RFC process.