I'm concerned about tying manifest (pkg/ver) to meta-data about a pkg too closely

In my past life I built a pkg management tool (we called them components). In another thread you asked about separating some of the data into multiple meta-data files and why that might be of use... and I guess I'd like to start by speaking to why one might want to do that.

What I'd like to see is a manifest that looks like this, pkg.mfst:

pkg_a@ pkg_b@v1.3.2 pkg_c@v1.3.x [if semver is supported over time, npm like syntax with ranges/etc] pkg_d@branch/latest [I like /latest added as it clarifies rapidly not a static revision, could go without] pkg_e pkg_f@320 pkg_g@TAGX

And it could just as easily have project/codebase version references, same thing:

proj_a@v9.4.1 proj_b proj_c@TAGX

To the tool they are artifact/version. For Go, no matter what is grabbed one must look at what is newly imported and then see if there is a pkg.mnfst to pin any versions and continue doing this recursively to build a workspace (assuming one is dynamically generated the full manifest, that should be optional in my opinion as well.. but supported). Don't get me wrong, I think additional meta-data is useful (and actually needed) and it should be optionally available in another file. There are a couple of reasons for this:

1) the manifest above specifies a multi-artifact (pkg) line of development -> if I have a big codebase and many teams I might have 4 or 5 or even 15 different "sane" dev lines

we had many more but that's beside the point -> these development lines could potentially be based on "parent" development line configurations (think of a fully branched manifest structure, each branch is a different development line with multiple versions along that line of development) -> for the most part the meta-data for those pkgs is the same... lets call it the pkg "definition", if a repo moves then I don't want to update 15 development line definitions or wait for those changes to merge through the fully branched and versioned manifest structure... I want to update one project definition for that type of change (if I could) 2) KISS - the simpler the manifest is the more likely it will be used and gives a lot of flexibility

Now the 2nd file might be called the "project" or "codebase" file. This can list all the packages available to that codebase (this one could be JSON). For Go this can be generated for the most part (as can the manifest which someone can "grab/use" when they want to start pinning or controlling things)... although some of the meta-data you might want to add might need to be added by hand potentially (based on what you show and other systems use like NPM/etc). Someone could take this generated project file and "augment it" if they wished by filling in more fields are adding in more controls over the project/codebase or packages. The codebase has no versions in it, it is just the codebase name, description, keywords (if you want em), if semver is supported (attr/vals), author/owner, image, whatever you might want (issue system pointer, homepage pointer, etc)... and it lists the known/available packages and their repo, where they live in the workspace, (and various pkg meta-data bits like license, etc... auto-determined or not, then fleshed out if the owner wants to take it over and manage it). If not, it's generated. It could also list "aliases" (old paths/names, upstream vendor canonical names) for pkg's, could put generated symlinks into the workspace (I know, Go doesn't do super well with that but it does "sort of" work and could do better, to avoid import path re-writing). One could also record "remotes" that one wanted the tool to set up for the client so the client could easily pull from "vendor" repo's for repo's they had forked, etc.

Aside: I am not a fan of importing snapshots into a single repo for my product, I prefer to use the power of the VCS systems and have my own clone for work... I'm also not a fan of import path re-writing... perhaps use gb with something like the above being best ... not with it's vendor plugin though.

What about scenarios... in my development line I want to rename pkgx to pkgy... I can add an alias to the project file and update my manifest to use the new name and off I go... when I merge my manifest down the new name can be seen as an alias for the old and (that's in the project manifest) that could be auto-handled potentially (via a manifest merge tool tied into the project details file if it's around and has enough info). This worked well for that system I built.

In that system the detailed codebase "definition" was versioned itself so you could see changes to it over time... and it was in it's own repo (here it could be if it was taken over, if not it's auto-generated and simply versioned in the workspace transparently unless the user wants to take it and extend or have it auto-updated with any changes to the project/codebase and version it).

Aside: Things like 'flatten' set to 'true' seem too specific, one could have 'pathing' and have various sub-modes like "layout" for such a thing and that would have values like 'flat' or 'deep' or something like that... with future sub-modes and additional values added without new structures in the meta-data, ie:

"pathing": { "layout" : "flat" }

Maybe too flexible but you get the idea (i'd avoid bool settings and go with modes at least so they can be further extended without more bool fields).

This is a marginal explanation as I only had a few minutes to jot down some thoughts... but just some food for thought.

Erik

@ebrady thank you for taking the time to provide the feedback.

From a use case (requirements) perspective, I don't think we are all that far apart. For example,

The need to specify a version (which means the version needs to be captured). The common conventions you called out tags and SemVer ranges which I've included as well.
Dealing with import path differences without rewriting import paths in packages.

I agree with you that "the simpler the manifest is the more likely it will be used". I also believe that the developer experience is important for working with these files. This is why I drew inspiration from other systems, such as npm and composer. These have already been incredibly successful and developers coming from other platforms will be familiar with the style making it easier for them to understand and adopt.

Defining any kind of new style brings with it the barrier of teaching others about the system. When existing successful systems are in place there needs to be, I think, a compelling reason to do something differently.

You bring up an interesting point about things moving. How often do you think that happens? With something like this I believe we target the 80% or 90% use case and not every possible one. Does this case happen often enough in practice that it's an issue?

I think with Go things aren't quite done conventionally. ;) Go tend towards simplicity so that's a focus of mine. I believe separating the project/codebase definition data (and deriving/generating as much of it as possible... while not precluding it from being managed/extended "by hand" when desired) allows for more trivial specification of the manifest/development-line data for each line of development within the project/codebase. While it may differ somewhat from what's out there I think it adds a fair amount of power and flexibility.

Aside: I'm not adverse to allowing some settings within the development-line/manifest itself so that codebases/projects with very trivial one-line of development that don't want all that descriptive codebase data can define "key" items in the development-line/manifest if they wish (think of it as a "key" fields only to keep the devline as simple as possible... but giving "enough" flexibility for very simple codebases to work reasonably with just the one development-line/manifest file). Maybe the format is a bit cheesy:

// some silly comment { "name" : "projx", "desc" : "great project to save the world" } pkg_a@v1.3.2 // great comment of some sort pkg_b@v1.9.4 { "key1" : "value", "key2" : "value" } pkg_c pkg_d omit proj_x TAGY

So one could augment the overall development line/manifest settings if needed and such settings might end up in the codebase defn that is derived when that workspace is created. Those key/values would be whatever are valid key/values in the codebase essentially (and maybe a few devline/manifest version specific settings).

This allows large installations to go big and fully manage their "codebase/project" definition, the best URL's, schemes, handle renames/etc if they need to ... while allowing small orgs or simple projects that just deliver a package or two to easily drive/support that with just the manifest/devline file (the project/codebase data being auto-generated from manifest file version markup or from the source code itself).

As far as versions... I think there is some alignment, yes. I think semver should be optional per codebase and something configurable (with a regex identying how semver versions look in that codebase but with the default matching the packaging lists proposal of v#.#.# along with semver 2 extensions and such... along with NPM like matching ability ... v1.2.x if semver is turned on). I do believe there is value to add branch/latest (ride latest on a branch) and value in being able to rapidly determine if a manifests contents are fully static or if some pkg (or project) versions being selected are dynamic (ie: can a simply put a tag on the manifest and describe a static configuration I can reproduce, or not?... that's why it's very useful to be able to know if it's a static config of pkg's/proj's or not... hence the branch/latest clearly delineating it from a 'something' type of thing which might be a branch and might be a tag... and perhaps a convention for "roving" tags vs static ones... if _dyn or _DYN or -dyn or -DYN is on the end assume it's a roving tag for example... or support tag/dynamic or tag/roving as the tag name with mixed case support). If one looks at gopkg.in it's essentially riding v1/latest or v2/latest (for instance)... so it's of use.

Beyond that I think when you have a manifest you need to identify how versions will be selected. If a codebase lists a package twice which version "wins"? If semver is on it may work differently than if it's off (if the versions are described by semver's). If semver is off then the 1st rev wins with a warning or error when the 2nd rev of the same pkg is hit if it's in the same directory in the workspace (no problem if it's not with the vendor experiment and such). Whatever the spec is I think it needs to clarify this (as one can begin to nest codebase/projects in others and, in the future, I would like to be able to have a big codebase and break up my pkg's into internal groupings within a codebase/project and hence if one of those nested groupings of packages is used being able to omit something coming from below might be nice). Other future items would be dynamic additional/omitting of any pkg based on add-on hooks for the tool... I've found this type of functionality pretty powerful (in supporting things like different OS's where you don't want to bring in a bunch of pkg's used only for windows development)... other options here are allowing OS, arch and such to be specified in the manifest but having both would be the best of both worlds (as I don't think we can really fore see when someone wants to include or exclude a given package). Yeah, getting ahead of ourselves a bit perhaps but I think hooks will be key to flexibility over time for more complex codebases.

As far as moving things about (renaming or even using two different names on different lines of development as architectural changes to the codebase are propagated through multiple lines of development for the project/codebase). I've found it to be fairly common in large installations at least. Can you get away without it for the 80% rule? Perhaps but you'll have grumbling if it's too hard or impactful on developers.

I would like to see a story around import path re-writing... as I'm not a fan. If I define my codebase/project with enough meta-data about the canonical vendor name then I can "fool" my workspace structure in various ways: a) symlinks (perhaps with requests to improve symlink handling when needed)

I use them now and they "mostly" work with many of the Go tools but one needs to always import from the same path (if you want Go to treat it as the same pkg data/etc) b) bring things in from my repo's (github.com/me/pkgx) while writing the vendors canonical name into the workspace (github.com/spf13/pkgx or whatever) so imports are "fooled". c) eventual ability (or a fork) of the Go tools so they can use the project/codebase definition and if this data is in that file (vendor info, aliases, and/or canonical paths, whatever) then the go build tool can look at it and resolve it even when I bring it in under my own path * this would be my preference frankly but (a) and (b) in the interim * again, by having one project/codebase definition file I can maintain this far more easily than keeping that data up to date in however many development-line/manifest lines of development I have going in my overall project (more of an issue for larger code bases but that's where my history lies and I like to scale) d) instead of (c) use something like 'gb' that replaces the go tool and knows how to read this file/etc... once proved out and takes over the world then perhaps (c) happens.

I've started playing with prototyping something here and we'll see how the workflow feels with some of the above (if I succeed). What could go wrong? ;) Figure it'll hurt my head since I'm fairly new to Go but will help me ramp up further.

My preference is for a package management system to have flexibility in bringing in packages direct from VCS's and work in a distributed fashion if/when desired (not precluding a central "hub" or "enterprisey" or org local hub in the future)... and not precluding use over time of different VCS "formats" for a given package. I might want VCS/source (git/hg/svn/..), I might want PkgSys/src (rpm/deb/..) or perhaps I want something else... build recipes or a form of the pkg (or nested codebase) that is "binary" (pre-built) that perhaps I can copy in or even link into my workspace if it's on a local mount/dir (heading very enterprise-y here but having this kind of flexibility gives a fair amount of power). Yes, this goes beyond what Go needs and targeted more towards a generic package management system... which is more my goal (with perhaps some "plugins" to support easy/dynamic Go development workflows within such a system). Anyhow, that same system can be used with a C or C++ codebase with monolith or independent "pkg's" if needed ... they would, of course, need a fully specific codebase/project definition file as it could not be generated like it might be for Go). Yes, I digress... but I think taking something along this lines would allow for Go use and get more traction even in other communities allowing for a fairly poewrful/pluggable system for use with any language that wanted to use it (and it could work with polyglot workspaces of different code types and manage them effectively... even with dependent/nested codebases and such). More contributors adding more flexibility, etc.

Anyhow... I agree new meta-data layouts or system styles might add a bit of a barrier but, if well designed, ideally they have a chance of working effectively and perhaps being successful. Could be smokin' something though. We'll see. ;)

Going back to, well, Go... I think godoc type functionality where the code is examined and things like licenses and descriptions and such are gleaned from files might work in some cases and could use some focus. The package descriptions are there (or can be if someone puts them there)... godoc uses them but the project/codebase level would need to be added and the tool would need to be able to parse that from the code (or from the meat-data file).

Cheers, Erik

@ebrady thanks for putting so much into sharing your thoughts.

If you've not already ready it, you might want to read "Don't Condemn Go To Repeat Past Mistakes". It speaks to project management.

I have some questions:

How would you handle a project in a private repo? For example, using a package name like github.com/example/foo doesn't provide enough information to know who to access it if it were private?
Are you familiar with the vendor experiment that came with Go 1.5?
Why do you think of gb for this space? There is the add on gb-vendor to do vendor management but the core of gb is a build tool. Why does a different build tool matter to package management?

And some thoughts:

We don't have a story around import path rewriting because it's not needed. This is essentially covered by the working with forks story. We do this in Glide today.
I agree that riding the latest commit on a branch is useful. It's documented here that a version can be set to a branch name. In Glide if you set a branch to be tracked and use the command glide update it will set you to the latest commit on that branch.
I'm familiar with the need to work numerous VCS. I assumed, and maybe it should be documented, that the same VCS that go get supports should be supported here. Those are git, hg, bzr, and svn. I even wrote a vcs to help with these because golang.org/x/tools/go/vcs doesn't support things like branch tracking.
I'm not sure working with binary packages is a problem for package management tools. There are some rather hackish ways to distribute pkg files to a $GOPATH and Go 1.5 came with the ability to do shared libraries. With these things is there a reason to a package management solution for the masses to approach this?

Hi Matt,

Thanks for the pointer to that article... good one. Yes, I generally agree with the thrust of it in terms of: a) KISS b) learn from what's out there c) keep in mind the Go community (purists, practical, try to learn from the best & improve, etc)

The question then boils down to do we mimic the "key" parts of those systems (some are flat and some are nested and perhaps we allow for both as you have attempted to). Do we put all the meta-data into this management system or should that meta-data remain in the code (eg: godoc-like) where possible? Can we determine the license be examining the LICENSE file and perhaps automatically setting that up,. Can we get pkg descriptions from the code and project descriptions perhaps as well (document something on how to put godoc in the file for this)... yet other items like key contacts, issue system pointers, homepage pointers, images, etc. What is required as base info for a project (or pkg) and where is that data located? Other points... do we use VCS's directly? Or a rpm/deb/etc pkg manager for the code? Or allow either over time. Should a central "hub" be set up in the cloud (with allowances for site-local hubs as well to augment or replace that)? If so, is that required?

On these and other questions... that's where things get a little more "interesting"... how will the dots be connected to make this work smoothly on the transition from now to the future vision. Hence the various discussions and fun. ;)

Let me see if I can address your questions:

1) How would you handle a project in a private repo? For example, using a package name like github.com/example/foo doesn't provide enough information to know who to access it if it were private?

Let me try and clarify a bit what I touched on above, the two key concepts being the codebase/project definition and the manifest/development-line definition. In the case of fuller import paths (or paths that one did not want to alias in the workspace) then one could auto-generate a basic codebase definition (perhaps cleaning some info from the files in the pkg/project data like license, description, etc dynamically). As touched on above.

In the case where one wants or needs to do more then one would define ones own codebase/project file and version it as part of ones codebase/project (allow it with existing packages but can also exist in it's own separate codebase repo/pkg, preferred for bigger installations):

{ "name" : "dvln", "desc" : "Multi-package and workspace management tool", "contacts" : { "authors" : [ "Erik Brady brady@dvln.org" ] }, "attrs" : { "linkalias" : "True", "jobs" : "4", "key2" : "val2", "key3" : "True" }, "pathing": { "wkspc_pfx_dir" : "{{if .GoPkg}}src{{end}}" }, "vars" : { "dvln": "http://github.com/dvln", "joe": "http://github.com/joe", "spf13": "http://github.com/spf13", "dhowlett": "http://github.com/dhowlett" }, "pkgs" : [ { "id" : 22, "name" : "dvln/lib/3rd/viper", "desc" : "dvln project copy of spf13/viper package", "ws" : "src/dvln/lib/3rd/viper", "aliases" : { "dvln/lib/olddir/viper": "src/dvln/lib/olddir/viper", "dvln/reallyolddir/viper": "src/dvln/reallyolddir/viper" }, "vcs" : [ { "type" : "git", "fmts" : [ "vcs", "src", "source" ], "repo" : { "rw": "{{.dvln}}/viper" }, "remotes" : { "vendor,spf13": { "r": "{{.spf13}}/viper" }, "joe": { "read": "{{.joe}}/viper" } } } ], "attrs" : { "GoPkg" : "True", "Vendor": "True", "Owners": "jessie@co.com", "Readers" : "anyone", "Committers" : "team1@co.com" }, "status" : "active" }, ...

This is a big hacked together but perhaps it conveys the basic idea for you. One could add licenses and individual pkg descriptions and other data to this manually but perhaps any "system" around it can glean that data instead (to be seen). In the above there is zero information about the versions of those packages, only what packages are available and where to find them (the read form and the write form and one could add in review if needed/etc).

With the above one can do various older names potentially (aliases), allowing one to use either the path or the name to reference the pkg in the workspace. The alias could point to the current path as well of course if one only wanted aliases vs actual multiple copies in the workspace (import could be bright enough to track either potentially if the build tool knew how to use the data there).

With the above one can also define multiple VCS systems per component if needed... so it is expandable to support source control systems like git,hg,bzr,svn,systemx or could also pull from a rpm,deb,etc type of system if one wished (and could switch between formats if desired). The system I previously worked on could do this and it was pretty nice... one could pull pre-built representations of many project/codebases (or pkg's if they were truly standalone... in this case depends on the language being versioned by the workspace/pkg manager since our system was used by various languages)... so we defined 'source', 'binary' and other formats ('recipe') that folks could pull (and the tool could be taught to pull things as efficiently as possible... of the 2500 repo's in this workspace I want everything in pre-built binary form that I can get (symlinked into the dir to avoid disk space use, pull time) and I want only those items I'm working on to be pulled in "source" form from the SCM system (source VCS). Yes, I know this may not all apply to Go in particular but for a generic package manage system it is pretty powerful.

I know, too much complexity is also the enemy of course (primarily of "use" by end users as they key goal, if it's easy for them and works well then it's a success)... assuming "easy" means it does what they need and handles their workflows and conveys key info and such.

With the above one has

handling renames
handling multiple code formats
handle multiple VCS/repo management mechanisms (even per pkg)
handle private repo's or public repo's
support semver or multiple version selection rules
with hooks it could give further dynamic control over workspace content (via omits, version requires coming from the hook mechanism, etc)
hooks can also had the ability to add subcommands or options dynamically for customer required features... and those hooks had access to data within the codebase definition (and manifest/devline/wkspc)

I know I'm getting carried away a bit... bit benefits can come of this. Can your system handle some of these scenarios? The other side of the coin is are they needed?... perhaps the 80% rule says not. ;) But I've found them useful in complex large environments.

2) Are you familiar with the vendor experiment that came with Go 1.5?

Yes... it looks promising. It allows one to "treat" something that is sucked in as a "standalone" item that is somewhat self contained. I can suck in my dependent pkg and if it sucks in 10 or 20 packages (on down the tree) the vendor/ structure under that comp I included would include all those dependencies as deep as needed. The "space" those pkg's run in is not "shared" in any way with duplicate copies of those packages (even if at same version)... but allows for versions of the same package in the workspace so that the pkg that requires a given version always has access to the exact version it wants (assuming one pins those dependencies at fixed versions as needed of course).

Now, as to if that extremely deep structure is a good thing with many duplicate package versions potentially at different versions across the workspace... that's a bit debatable. For most they likely won't care as long as their product gets built. For an enterprise or larger open product they'll probably lean more towards flattening that as much as possible since they will be likely be mucking around and adding their own changes to those vendored comps and want to share those changes across all users of that pkg and such... but, up to them.

My description here isn't grand but that's the jist of it I think.

3) Why do you think of gb for this space? There is the add on gb-vendor to do vendor management but the core of gb is a build tool. Why does a different build tool matter to package management?

I think of 'gb' like tool and probably should have clarified that. I believe that if/when a decent project/codebase mgmt mechanism comes into play that if/when the build system leverages that data it can do much more than it does today. One should be able to use 'go get' just as easily as one can without a manifest/project ... having those only affects what version that cmd gets (and perhaps what detailed info one has about those artifacts). One should be able to use 'go build'. Ideally one should not even need to structure the tree with the github.com/blah/repo/ paths for all the packages or with the deep vendor structure. I'm not a huge fan of jumping all over the tree with those as I like typing less so I prefer being able to lay my tree out like I want. I also prefer not having to re-write imports. If I have aliases set up and the build tool can use them it can see if I have "github.com/spf13/viper" in a file and even if I have only "github.com/erik/viper" in the workspace it can check the codebase defn and see that the upstream vendor is that import path and it can be bright enough to use my comp during the build.

Aside: I also like having separate workspaces for each project I'm working on vs a single workspace under a given GOPATH... and I like how 'gb' works with that. Yes, gb is a little too opinionated on the dir structure for my liking but a step at a time. Any code I work on I want to be able to modify and fork... and I think that's true for most engineers. I never use vendored code clean after weeks or months... well, maybe 40% of those packages I grab I'll start modifying. To me they are owned upstream and I like to recognize that and be able to fork and contribute back... but I don't want to lose the depth of a VCS to manage every pkg I'm working on, including those I've forked. But that's just my opinion.

Yeah, I have a few opinions. ;)

Regardless, anything is better than nothing and I think glide is probably better than most other solutions I have seen. I think more can be done and I have my own preferences but, regardless, I'm appreciative of your work and efforts here and applaud that. Ideally with sharing in the community we'll all keep moving towards something that is a fair amount better than what we have now. :) Thanks for your efforts.

Let me touch on your "thoughts" above:

"We don't have a story around import path rewriting because it's not needed. This is essentially covered by the working with forks story. We do this in Glide today."

Yes... this is essentially just having a repo location/info vs workspace path which is also possible in the structure I suggest (and aliases are available as well). I think this is needed. However, I do not think that this is a preferred long term solution. I want to see my repo's in my own directory (or under my own path) and I still don't want to re-write imports. I want the build tool to be smart enough (as touched on above). However, I agree this is a good interim step and is needed for other code bases, aliases and other needs (and can be used to avoid re-writing imports nearer term).

"I agree that riding the latest commit on a branch is useful. It's documented here that a version can be set to a branch name. In Glide if you set a branch to be tracked and use the command glide update it will set you to the latest commit on that branch."

I think that's good. However, how do you know the difference between a branch and a tag without hitting the VCS system? You may (or may not) be able to figure that out form a VCS but if you have 2500 pkg's in your dependency tree then it's time consuming to hit those repo's and figure out if you have a static or dynamic overall configuration (if your manifest or devline version is static or not). This is important to know if you can safely tag the version of the manifest or devline (the "configuration") ... and knowing this quickly is important to scaling effectively (IMHO). If you want reproducible builds this needs to be something that can be determined rapidly without hitting many VCS's. Hence the comments on adding something like branch/latest to differentiate vs tags. Another option would be to allow optional prefixes when it's indeterminate like branch:xyz and tag:xyz or dyntag if it's a dynamic tag. I'm fairly easy there but, regardless, quick determination is important (especially at scale). Again, MHO.

"I'm familiar with the need to work numerous VCS. I assumed, and maybe it should be documented, that the same VCS that go get supports should be supported here. Those are git, hg, bzr, and svn. I even wrote a vcs to help with these because golang.org/x/tools/go/vcs doesn't support things like branch tracking. I'm not sure working with binary packages is a problem for package management tools. There are some rather hackish ways to distribute pkg files to a $GOPATH and Go 1.5 came with the ability to do shared libraries. With these things is there a reason to a package management solution for the masses to approach this?"

I think other VCS's will be important down the road and your vcs work I think is a good start on that road (appreciate that work, I was checking that out a couple weeks ago since I was thinking of doing something there myself). Was thinking of splitting that into reader and writer and such (as it might be expanded further down the road so a RepoReader and RepoWriter type of interfaces to keep them smaller). That is an aside.

As far as binary and other systems... with Go it is less critical I would say but with other languages it can be pretty powerful. Since my target is tending towards more generic with "plugins" to allow working efficiently on something like a Go workspace... I probably have stronger opinions here than you might.

Anyhow, can follow up further later but need to run now. Thanks much for the thoughtful input and feedback. Cheers!

Erik

ps: sorry for the verbosity, I type fairly fast and am known to be a bit verbose at times. ;)

@ebrady Thanks for the long and verbose thoughts.

Without a whole lot of time, I can respond to a few things.

First, licenses present a problem for the tools. Some people include license files to identify the license. Others put it in the Readme. And still others put it in the headers of the code. In each of these places there are a variety of licenses and many of them don't self identify. For instance, the MIT license doesn't say it's the MIT license. So, a tool looking to figure out the licenses needs to look in all of these places, parse everything, and try to intelligently sort it out.

It's far easier (and less work for tooling) if it's specified. Especially if a directory listing (e.g., npmjs) wants to put it on display. I do think about the marketability and legal aspects of packages.

Second, when it comes to a manifest I'm more interested in json, yaml, and ini file formats simply because of developer experience. When something is recorded it needs to be easy for developers to figure out and use. They should just get it. Creating something special for Go puts up a barrier to use. It's also more difficult for non-Go tools to work with and we'd see that in the form of things like IDE add-ons.

Third, while I take some considerations to the implementing system into account I'm not looking to solve that here. There are lots of ways to solve things. For example, a tool could have a cache file it maintains from the packages it looks at. A file in an optimal format for the tool. Instead of reading from the VCS each time it can read from that tool. But, that's all really left up to the implementing tool.

A package spec can be implementing with more than one tool and in more than one way. For example, Glide manages your vendor/ directory. You'll likely never see a single application with 2,500 packages. In practice hitting the VCS each time (since they are mostly local) isn't that big of a deal.

Now, if someone were going to use a tool that managed their entire $GOPATH they'd want to do things differently. But, those differences are in tool implementation and tool specific needs.

The idea I'd like to drive home is one small-ish set of metadata that all the tools can use as a base.

I get the impression you're thinking of how you'd build a tool for your personal workflow. I'm thinking of the problem a little differently. There are a number of popular tools right now. All of them keeping and managing similar information in different formats. Why can't we have one format to handle it so the tools are more compatible.

I agree with you that licenses present problems today... and that, now, an entry in the pkg (or proj) meta-data is required (no problem with that). Ideally I would like to see a recommendation to use std license strings as "recommended" in one of a few places perhaps in a Go pkg (and, if one does, it'll be auto-determined, if you don't you need to enter it in meta-data). That sort of thing... perhaps pie in the sky, but with a goal that if a pkg/proj is fully defined "in source" (via recommended doc markup, added into go vet or whatever) that it already be "close" on key meta-data (and that the manifest focus on pkg/ver and proj/ver listings for the most part "eventually", in the simplest possible fashion).

As to the format and leaning towards JSON/etc... and can live with it... although my personal preference is still leaning towards even simpler structure (but I admit I have a bias here as I've used this tighter format for many years and engineers in that org picked it up very rapidly as it is so trivial, and IDE's are OK at editing simple text). Regardless of JSON (or YAML or whatever) or a simpler format I think the main thing to get agreement on initially might be:

What data is present (get agreement there first perhaps), and what formats & naming are used A. Format aside: a pkg like spf13's viper allows for JSON, YAML, TOML or Properties (expecting consistent naming), perhaps that is used for flexibility (?)... if not then likely JSON B. Where is the manifest versioned?... I'm assuming any of the supported VCS's? a. Must it be versioned with the code of the project or can it be in a standalone VCS? b. I've found that manifests with many versions can grow quite large over time and a separate repo can be quite useful to track revisions/configurations of a project/codebase (allowing either would be nice) C. Is the base revision data: project/codebase name w/optional revision and/or VCS package name and revision (?) a. For a "package" is that a Go package or a VCS package? (in the manifest)

a VCS package can contain Go packages of course
or are both supported (ie: it perhaps tries to determine the base VCS pkg and pulls that, if this is possible how does it handle conflicts of "sub-pkg's" at different versions needed, etc) D. What is the exhaustive listing of all data for the codebase/project, descriptions of each field E. What is the full listing of all data for each pkg or project we depend upon (is your file example everything?) F. I assume that the manifest can include just 1st order deps if they all have their own manifests (ie: any "project/codebase" level include would have that pkg's data alone, any "raw" pkg includes could be first order imports as well as imports of that import, etc... so might have many dep raw pkg's listed a. if it's a project/codebase include do I need to list meta-data details since I can go to that project and get it's data and don't need to duplicate it locally... what is the preference for the system?

Beyond that you likely want to touch on desired functionality for common operations A. Renames of pkgs (eg: new hosting service, refactor of code/pkg's with new naming, etc) B. On a specific manifest development line I need to rename a pkg, while the other working manifests for my project/codebase will not rename it (at least not until I merge in these changes I'm working on or they rebase to my work or whatever) C. Is there a need to support multiple formats for a pkg (get it from VCS/SCM, get it from VCS/pkg, get this one from a local tar/gz file in a ver named dir structure NFS mounted plus some local patches there)

What are the preferences to avoid import re-writing A. At touched on above... use canonical/vendor import path name but grab from your own local or forked clone (ie: hide the real local repos path and fake things up) B. Be able to put this information in meta-data as a 2nd option (?), eventually build tool could use (so simple mapping would be in the meta-data defined, what is the vendors repo name, so import paths don't need to be re-written but I can still use my local path to the repo) a. if this is allowed a way to "describe" the pkg by the vendor name or the current workspace name should show all names/aliases/vendor names and such (and accept any of them) C. Other solutions like symlinks or ???

What about version selection?
A. If semver 2 is always the default how does it work (NPM like support?) a. can I indicate my codebase/project doesn't use semver?
B. If I end up selecting two versions for the same workspace path how does it work a. if semver active or not, if dynamic v1.2.x versions used or static v1.3.2 versions used for colliding entries? C. Cover any variations based on "flat" workspace vs vendor experiment deep workspaces I suppose

I'm not covering all the details but it seems that any "spec" needs to agree on the basic meta-data per project/codebase and per package [+version potentially for each], how collisions are handled, how revisions are specified (eg: NPM like semver 2 support with Dave Cheney's proposal for semver 2 tag formats (?) + branch latest riding plus sha1 or tag or other valid version selection). If folks generally agree on the data and how things will work... then the structure of that data with any field names/etc would be the last buy-in item.

I guess I would feel warmer and fuzzier saying "that's close to what I'm thinking" on a spec if I kind of know the answers to these questions. In the end it may simply not be possible to get agreement and folks will start using the tool that works best for them and it'll get enough market share and can be refined/adapted from there (usually how things work). ;)

As to your third point that I'm thinking about a specific target or personal needs, ie: that 2500 pkg's is unreasonable and such... and that hitting the repo's shouldn't be that costly. I agree we see things differently from that end. I came from an environment that had this so any solution had to work within that context (and, again, I'm thinking towards a more generic pkg manager idea vs solely for Go). Something like camlistore has 269 packages (probably a couple hundred VCS's if one was to pull the tree as separate VCS's pkg's which is what you're looking at vs vendor code drops which is what camlistore uses currently as that sort of dependency management/safety is probably easiest mechanism today at least until it comes time to do new code drops and local changes have been made). If my workspace had something similar with independent VCS pkg's and perhaps even managed many other tools and packages then scaling might be hard. If the clones are local and I need to run some cmds, perhaps not too bad as you say, unless I have 1/2 SVN repo's or something then it might hurt a fair amount.

Regardless, I see that as not that critical as one could "expand" a spec later if needed to more clearly indicate dynamic or not on a version via conventions added later (and so clients could optimize using those if they desired at that time while not breaking earlier use which would remain slower but backwards compatible).

Anyhow, you asked for feedback so there ya go. ;) Appreciate your time with this discussion... perhaps some of the above could prove useful as you move forward. Again, I think glide is promising and I wish you the best in your approach. Thank you for your efforts. I believe we have a fair amount of overlap in opinions... but a few differences based on experience and such. Regardless, thanks much.

Cheers, Erik

@ebrady In my effort to really understand this space and the direction to go in I've been looking at how others solved these problems and what they learned along the way. I recently wrote up some of what I found so others can easily see it without reading all kinds of docs and code.

I appreciate you poking me on this. I have a much better understanding and appreciation for this space.

At this point I'm looking to make the developer experience easy to use and work with for people coming from other languages as well as solving all their use cases.

mattfarina / pkg

I'm concerned about tying manifest (pkg/ver) to meta-data about a pkg too closely #5