Add commands to generate initial .go.yaml and to "freeze" the current versions of dependencies

mediocregopher / goat

A simple, yaml-based go dependency manager

Apache License 2.0

68 stars 11 forks source link

Add commands to generate initial .go.yaml and to "freeze" the current versions of dependencies #19

Closed pib closed 8 years ago

pib commented 10 years ago

These changes make it easier for folks to start using goat and to update dependencies without having to hand-edit their .go.yaml file.

mediocregopher commented 10 years ago

Thanks for the PR!!! I'm sorry I'm being slow on this, but work is kicking my ass this week. I'll try to dive in either today or tomorrow though.

mediocregopher commented 10 years ago

So I'm playing with the goat gen command and I'm a bit confused how it's supposed to work. I basically copied the setup I have in the tutorial but never made the .go.yaml file. When I used goat gen it gave me the error:

Couldn't find github.com/mediocregopher/goatproject/foo in any of [/opt/go/src/pkg /home/mediocregopher/src]

Is it necessary that I already have the library installed on my box for the gen command to add it as a dependency? If so, can that not be the case? I want to avoid having to have the user do anything involving global dependencies when using goat. It should all be completely contained inside the goat project.

pib commented 10 years ago

The reason it looks for the package is because it looks for git or hg directories in those projects to add the specific version that is installed. I suppose it could leave the current version out if it can't find it, but it also wouldn't be able to identify multiple packages from the same repo in that case (it does that now and only adds one entry to the .yaml file).

"goat get" of the desired package will install it non locally... On Jan 14, 2014 10:28 PM, "Brian Picciano" notifications@github.com wrote:

So I'm playing with the goat gen command and I'm a bit confused how it's supposed to work. I basically copied the setup I have in the tutorialhttp://docs/tut.mdbut never made the .go.yaml file. When I used goat gen it gave me the error:

Couldn't find github.com/mediocregopher/goatproject/foo in any of [/opt/go/src/pkg /home/mediocregopher/src]

Is it necessary that I already have the library installed on my box for the gen command to add it as a dependency? If so, can that not be the case? I want to avoid having to have the user do anything involving global dependencies when using goat. It should all be completely contained inside the goat project.

— Reply to this email directly or view it on GitHubhttps://github.com/mediocregopher/goat/pull/19#issuecomment-32333949 .

pib commented 10 years ago

I mean "goat get" will install it non-globally..

pib commented 10 years ago

The reason it isn't able to right now is because it depends on being able to import packages (via "go/build".Import) to be able to detect which packages are nested within the same directory. Without the packages being installed, it can't really tell if they are nested under the same repository.

Perhaps a better option would be to have it try to "goat get" any dependencies that are missing? It would need to have the package path set first so it doesn't try to fetch any sub-packages which are imported.

mediocregopher commented 10 years ago

In my previous comment I didn't really understand the error I was seeing. I have a project with the path github.com/mediocregopher/goattest, with the .go.yaml file. Inside that there is a sub-package, github.com/mediocregopher/goattest/foo (this is the setup I use in the tutorial). I think the gen command is trying to generate an entry in .go.yaml for that sub-package when it shouldn't.

pib commented 10 years ago

Oh! Of course. I had been holding off on sending this pull request and I had forgotten why.. whoops.

There needs to be some solution for the initial setting of the "path" setting in the .go.yaml file, to avoid issues such as that.

Perhaps "goat gen" could be a more interactive command, prompting the user for answers to such things as "what is the import path of your package" and "do you want me to automatically install imported packages into this goat environment so the proper list of top-level packages can be determined?"

What do you think? It seems to me that it would useful as a one-time interactive start, then "goat gen -update" and "goat freeze" could be more automatic once the initial setup has been done. Of course there could also be command-line flags to specify the path and that it should or shouldn't auto-fetch packages.

mediocregopher commented 10 years ago

I think the only question which needs asking is "what is the import path of your package", because with that you can determine sub-packages and ignore them (as well as fill in the path field in .go.yaml). I'm not for sure, but I don't think it's necessary to actually download the packages in the gen step for any technical reason, and I'd rather keep the functionality between gen and deps non-overlapping.

Also, is it actually necessary to have an -update flag for gen? I guess it would prevent someone from overwriting their previous .go.yaml if they didn't know what they were doing, but that's what version control is for, and the vast majority of the time you're going to want to replace the existing one.

tummychow commented 10 years ago

Is this PR still alive @pib? It resolves a serious gap in goat functionality for large dependency graphs (namely, that any dependencies using the git/hg type have to have all their dependencies specified manually, which expands the size of .go.yaml by an order of magnitude). I'd love to see this get integrated or, failing that, take over it myself.

For whatever my opinion is worth, here's my thoughts on how goat gen should behave...

Scenario 1: if .go.yaml already exists, goat gen should expand it without modifying any of the already-specified dependencies. I think the following behavior is sane and least surprise-inducing:

if there is a dependency found in your source code that is not in .go.yaml, add it with the default type of goget.
if there is a type: git or type: hg package in .go.yaml that is not currently cloned to .goat/deps, it should be cloned now, before proceeding to the next bullet point. I don't really like this behavior (ideally, goat gen should only ever modify .go.yaml - maybe have a flag to toggle that strictness), but it's the only way to resolve that package's dependencies. And since you already set the type of the package, I think it's reasonable to assume that you have also set a suitable revision, and you are fine with whatever will be cloned.
if there is a type: git or type: hg package in .go.yaml that is cloned to .goat/deps, inspect its source code for dependencies, and add any missing ones to .go.yaml with type: goget. Same behavior as the first bullet point, except instead of inspecting your own package, you're inspecting a local version-controlled dependency.
ignore all dependencies in .go.yaml that have type: goget. go get will fill in the entire dependency subtree when it is invoked, so there is no need to do any further modification here.

What this means is that I can create a blank project with no source code, scribble out my top level dependencies and project path, and then invoke goat gen. It fills in all the second-tier dependencies for me automatically, and they will all be using type: goget. I can then inspect the newly added dependencies to decide which ones need to be fixed (change the type to git/hg) and which ones are safe to leave with go get. I change the types for those packages and invoke goat gen again. That adds another group of goget dependencies for the next tier. I can keep doing this until I have specified only the subset of my dependency tree that matters to me. goat gen would ensure that any other subtrees are being fully resolved by go get. And since it does not modify any already-specified deps, I can invoke it iteratively over and over until I'm satisfied.

Scenario 2: If .go.yaml does not exist, you have to specify the path of your current repo on the command line, otherwise goat gen fails with a suitable error. Eg goat gen github.com/foo/bar. It then scans the current package for any go source code and populates the top level imports of that source code in .go,yaml, using the default type: goget. This gives me a basic .go.yaml. I can then apply the same steps I took in the first scenario to refine the dependency tree as needed.

Side note: if you specify goat gen path/to/repo when your .go.yaml already contains a path, what happens? The behavior here would need to be defined. I say clobber the existing .go.yaml path. I find it difficult to believe you would invoke goat gen with the wrong path unless you intended to change it.

I agree with @mediocregopher that I don't see a use case for a --update flag. What would this do? A bundler update-esque command doesn't have meaning for Go since we don't have semantic versioning constraints like Ruby does. goat uses .go.yaml declaratively, so if you want to move a dependency forward, you should only have to edit your .go.yaml and then run goat deps again. If you've manually tweaked your dependencies and want to reset them to the originally specified versions in .go.yaml, it should, likewise, be enough to just run goat deps which would force the .goat/deps directory back into the specified state.

No opinion on the freeze command, I'm not entirely clear on what it does... is this similar to, say, godep save? Assuming it is, I think you would use this command if your project was using a global GOPATH and you wanted to transition to goat. Not a useless command, but the functionality is, in my opinion, a separate discussion from goat gen. goat gen should be responsible for adding missing entries to .go.yaml so that it specifies a complete dependency tree. goat freeze (or goat save or what have you) is responsible for modifying the specified revisions of existing dependencies. They might have similar code under the hood, but I think they should be taken up independently.

tummychow commented 10 years ago

A few other thoughts on goat freeze that came to me just now, while proofreading my comment:

I can see a use case where your project is using a global GOPATH, and controlling revisions by checking them out manually, where you would want to goat gen (to populate .go.yaml with dependency entries) and then goat freeze right afterwards (to migrate the global, manually set revisions into your new .go.yaml. I'm personally not convinced that goat freeze should just do this whole charade for you... suppose a .go.yaml does not exist and you invoke goat freeze. Does it just use VCS types for every single package in your dependency tree, and set the revision to whatever you have checked out globally? (Eh, well that's not a terrible behavior, but I'm not sure if it should be the default. Personally I'd think that if goat freeze is run on a project that doesn't contain .go.yaml, it should exit with an error.)
how does goat freeze know which deps to freeze? I'm skimming the code, looks like it only freezes the ones that use type: git and type: hg. That's a good default in my opinion, but it means that goat freeze does not follow up with the default behavior I described for goat gen in an empty repo. As far as I'm concerned, this is just another reason that freeze and gen should be kept strictly separated CLI-wise.
what's with the name freeze anyway? I never really understood this name. The only place I've seen it before is pip (not a pythonista, so I don't really know what it means in that context either). I will admit that I can't come up with a less ambiguous name at the moment. For example, godep has three delightfully ambiguous command names (save/update/restore), which don't really tell you anything about the commands unless you read the docs. I guess an unintuitive name is not the end of the world.

mediocregopher commented 10 years ago

I agree we should leave the freeze discussion for another time, and just focus on populating .go.yaml with missing dependencies.

@tummychow I like your plan for going forward. I have two reservations (neither of which I have solutions for, so I'm not really being helpful here :P)

I'm not sold on the name gen. To me that stands for "generate" which would imply creation from nothing, when that's only 50% of what it does. Maybe populate? That may be too long though. Would be interested in if you or @pib had any other ideas on that.
I mentioned above that I'm a huge fan of gen and deps having overlapping functionality. I don't see anyway out of it, since we do have to fetch dependencies to find out if they have any dependencies. It's also a bit weird that only git/hg deps would be actually fetched (if I read the proposal correctly). We might want to implicitly call deps at the end and tell the user we're doing it so they're completely good to go once everything is said and done.

Apart from that, I like the idea, although I won't have time to implement it myself for a while (still moving in to my new place, work, bleh). If you @tummychow want to take a stab at it be my guest. I have one implementation detail I'd like to go over if you do want to do it though:

While you broke the behavior up into two different scenarios, I'd like the code to basically only actually have one scenario. At the start of the call there's going to be a "state" variable of some sort, which contains all dependencies discovered and accounted for thus-far (initially empty). As you go through the code and its dependencies you'll be going back to this variable to check if the dependency has been found, and if not add it to the "state". At the end of everything this "state" variable will be used to generate the whole .go.yaml file, which will clobber the old one. If there is a .go.yaml file to start with, this would be used as the initial "state" variable, instead of an empty list. So the code-path would look like:

Initialize "state" to empty
If .go.yaml already exists, use that to generate "state" and continue with that
Check if we have a path for our project set in the "state", if not ask for one (or check for command-line argument)
Look at "state" for existing git/hg dependencies, deal with those
Look at rest of code for other dependencies, add to "state" as goget
Dump "state" to new .go.yaml file, overwriting the old one if it was there.
Maybe run deps, to get the goget ones if we didn't need to previously

Let me know if that didn't make any sense, I'm not the best at explaining <_<

Sorry to be dictatorial about this, and I'm definitely open to suggestions/improvements, I just didn't want to end up with two different codepaths for this command; that would make things more complex both in code and for the user using the command.

tummychow commented 10 years ago

Name's not important to me, I'm open to anything that makes a reasonable amount of sense. We can get around the issue of long names via prefix matching, so goat pop, po, popul, etc would be the same as goat populate.

And I agree there would really be only one codepath. From the user standpoint there are two different use cases, but internally, "no .go.yaml" is really just a special case for ".go.yaml that has no deps in it". We're on the same page there.

As for deps/gen overlap, I think you're right. You understood my suggestion correctly but yours is probably better. My idea was "fetch as few deps as possible to resolve a complete tree", but then you have an inconsistency, where your .go.yaml would be represent a complete dependency tree, but your filesystem is actually incomplete and only contains some of the deps. That's kind of confusing... Calling deps at the end to fill in any gaps is a more sensible default because it ensures that your .go.yaml actually reflects not only your code, but also the packages you have locally on disk. So yeah, I agree with you. gen should call deps at the end, maybe with flag to disable for the nitpickers.

I've got some end-of-term wrap up before my next semester starts but I'll see if I can get some code on the table for this command. @pib if you want to revive this PR, just let me know so we don't duplicate work.

pib commented 10 years ago

Oh hey, I've had the email notifications for this thread sitting in my inbox for a couple of days now, and I just haven't had a chance to even read through the comments. The Go project I was working on when I first submitted this pull request has been done for a while and I haven't had a chance to do much Go until recently at a new job.

I definitely think this is a feature goat should have, especially as a tool for getting people who haven't been using it up and running quickly.

I'll start from the oldest comment and work my way to the most recent, starting with @mediocregopher's last comment from January :P

The main reason for downloading the packages in the gen step is so that those packages' dependencies can also be determined if they aren't using goat. I can't remember the original reason I implemented the -update flag, but I'm assuming I ran into a use-case that made me think it was a good idea :)

@tummychow Looking at scenario 1, it sounds fairly reasonable. The one thing I think I disagree with is that it's ever really safe to leave a dependency as type goget, since that's just asking for something to break in the future. Also, I don't really like the idea of having to run goat gen multiple times just to get the full list of dependencies. My previous suggestion of having goat gen actually call goat get on missing dependencies would solve that issue and allow it to gather all the dependencies and nested dependencies in a single go. Then it can also detect what type of dependency it actually is (i.e. git or hg), and save it as that type. I really think that using a type of goget should be discouraged since it's basically counter to the entire point of goat in the first place.

As for scenario 2, I really like the idea of just passing in the path of the current package to goat gen. It's a nice elegant solution to that. I also agree that having it overwrite the package path if you specify a different one sounds reasonable.

As for the --update flag, yeah, all that did was prevent you from running goat gen again by accident and overwriting your .go.yaml file, but I'm not sure that's really needed anyway.

The idea behind the freeze command is that you might know you want to install a package, but not know exactly what version of that package you need, so you can add it to your .go.yaml file without specifying a version, and then you run goat deps and then once you've verified that the current version works with your code, you can run goat freeze and it will update the .go.yaml file with the current version, "freezing" it in place so that a later change to that project doesn't break your code when you run goat deps on a fresh checkout. This is also the reason I'm so against having goget type dependencies be the default, since that will result in similar situations.

The name freeze just came from pip freeze which gives you a list of the current versions of all the currently-installed packages.

Also, yes, the goat freeze command could come in handy when you are converting a non-goat project to a goat project, allowing you to take your whole set of currently known-good versions of dependencies and save them into your .go.yaml so other checkouts of the same code will be sure to get the exact same versions you currently have.

Also also, the reason that freeze only works for type: git and type: hg is that those are the only versioned types currently supported by goat. I don't think I've ever seen many (or perhaps any) Go packages that weren't either in git or hg, so once you've checked a package out, even via goat get, you will be able to detect the type of the package and goat freeze will work as expected.

@mediocregopher I agree that gen isn't the best choice. What about something like fill, since that doesn't imply that it is just initializing or just updating.

pib commented 10 years ago

I guess the one thing I left out is the case when a dep is neither a git nor an hg repo, in which case, a goget dep would have to be used.

So, with that in mind, I like the proposal, with two proposed modifications:

During the goat gen/fill/whatever command, if a dependency is not found on the path, a goat get of that dependency is done.
When a new dependency is found (and after it is gotten via goat get if needed) its type is detected and it is stored with the appropriate type and type-specific URL (I already implemented that in my pull request).

I do like the idea of freeze being a separate step from all this, but perhaps adding it as a flag would make sense instead. Something like goat gen --add-versions, and it would only add versions to those dependencies that don't already have one in .go.yaml. Again, this can be held off for a separate proposal if that makes more sense.

tummychow commented 10 years ago

I started typing this before you added that extra comment, but personally I'm not concerned about svn or bzr repos. For whatever it's worth, most of the competition only supports hg/git as well, and sometimes one of svn or bzr. There are a few managers (godep, gvp/gpm) that download entirely via go get so they have support for all four. But, as you said, the number of svn/bzr users pales in comparison to git or hg. For now we can write it off to a future feature.

My problem with resolving every package to type: git or type: hg (from here on out I'll just say git for brevity) is that you are now responsible for the version of every single package in your dependency tree, not just your immediate dependencies. For a large dep tree, that results in a massive .go.yaml. It's hard to tell where to begin when faced with a tree that has too many nodes in it, especially if I'm trying to isolate separate subtrees of my dependency graph.

The problem is intractable right now because Go has no de facto metadata for transitive dependencies, so if we want a clearly defined dependency graph then the only way to get it is by listing every single version of every single node. But I don't think all dependencies are equally demanding of attention. If I'm going to be in dependency hell anyway, then let me focus my efforts on the packages that are most fragile. Some things are like rails and they'll explode on every version bump, but other things are like go-yaml and they'll probably be backwards compatible forever. I can use type: goget for the dependencies that are more stable or less active.

For me, that is the heart of the issue. If I have a type: goget dependency and it's too loose for me, I can always change it to type: git and fix its version, then add each of its dependencies to the dep list as well. But if I have a lot of type: git dependencies, it's harder to go in the other direction: collapsing a group of type: git deps into a single type: goget subtree will require a lot of trial and error. I'd rather collapse (almost) fully by default and let the user expand, than expand fully by default and let the user collapse. I think the former direction is easier for the user than the latter.

Finally, I think the real solution to your concerns would be an implementation of #24, which would eliminate the looseness of type: goget. That would provide a balance between type: goget as it is right now (resolves the entire dep tree, but have fun keeping track of which version you're using) and type: git or type: hg (precise version fixing, but now you have to find the dep tree yourself). If type: goget supported ref, then you could have a fixed version of the top-level dep and still get (unversioned) resolution for the subtree. Until then, I'm prepared to compromise... either default is fine with me as long as we're open to a flag to switch to the other behavior.

Or we can take the third option - block on #24 and implement this as a followup, so that we can integrate the behavior of #24 from the ground up. But if you're waiting for me to implement #24 myself, I might be a while...

trans commented 10 years ago

I'm all for this idea, but I am against it overwriting the .go.yaml file. That should contain only the dependencies a program needs, put there by a human being. If goat overwrites it, adding all resolved dependencies, it's going to get computerized --order of entries will change, comments will be lost, etc. So instead it needs to generate a separate file. And seeing that there is no reason to be cluttering up a projects root directory --and goat is arguably already taking up one more path than it needs by having both .go.yaml and .goat/ there-- this separate file can go in the .goat/ directory.

pib commented 10 years ago

@tummychow that is a good point about generally stable packages, but I'm still worried about the case when something changes and it takes you a while to track down where it broke. I left a comment on #24 that could partially cover the issue of only depending on a sub-package and not wanting all the dependencies in the entire tree of packages in that repo.

@trans I get what you mean, and I agree that it is a bit overkill to have to specify the full dependency tree when really all you want is to specify the dependencies of your project.

This is the reason for bundler's Gemfile.lock and Berkshelf's Berksfile.lock, which is locking the dependencies and all sub-dependencies to specific known-good versions without filling the user-specified Gemfile or Berksfile with things they don't directly specify. That was the general idea behind goat freeze, but I wasn't familiar with bundler or berkshelf at that point, so I hadn't seen the idea of keeping them in a separate file that is also read by the dependency management system.

If a freeze/lock file went into the .goat directory, would it also make sense to put the .go.yaml file in there? The idea would be that you check in the freeze file as well, otherwise it would be pointless to generate it in the first place, so that would mean that .goat would have to be removed from people's .gitignore files (and .goat/deps added instead?).

tummychow commented 10 years ago

@trans you might find [goop][https://github.com/nitrous-io/goop) to be a better fit for your approach. Personally I prefer having a complete view in .go.yaml. If I don't want to specify dependencies-of-dependencies, then I have to either depend on #24 or hope for more people to adopt a version metadata format along the lines of .go.yaml.

mediocregopher commented 10 years ago

Man, I need to keep up with this thread better :P

I like fill. Unless there's a better suggestion let's stick with that (although we can keep saying gen if that makes the discussion easier)
I agree with @tummychow about filling in with goget unless the user decides otherwise. While it would be cleaner to determine that actual type and lock the version in all cases, this does end up creating a gigantic and unmaintanable .go.yaml file, which some people may want/need, but I'm pretty sure most are just trying to lock down one or two deps and say to-hell-with-it for the rest.
Having reread the freeze comments I think I understand a bit better why it might be useful. If all the dependencies are specified as goget, but someone wants to lock all the versions, essentially turning them into git types. So while I agree it's useful, I don't think it warrants a whole new command (which isn't to say we can't roll it in with fill), since like I mentioned I think it's a minority case that someone wants to lock every dependency.
I am pretty against changing the file/directory setup. .go.yaml is the user facing side of goat, and .goat is the magic black box that they never have to touch. Any changes make things more complex and cluttered, as well as probably not being backwards compatible. We should avoid modifying that setup except as an extremely last resort. I'd like to avoid adding new .go.yaml type files to the root of the project too, if we can.
I agree with @trans that overwriting the .go.yaml file is less than ideal, primarily (imo) because it gets rid of comments, which are definitely useful. I'm not as concerned about any ordering the user might have put the deps in that might get overwritten, but maybe others care more. I don't really have a good solution for this, and it's probably the primary problem we need to come up with a solution for. I don't suppose there's any yaml libraries that magically preserve comments?

So, after all that, here's my thinking: the fundamental problem/disagreement seems to be of whether to goget deps that we haven't specified and let fate take them as it will, or to lock them down so we can wrangle them in, or something in between. My thinking is that we make this an interactive command, where each step could be pre-answered using command-line flags. So the process, from the user's perspective would be something like:

Get asked for the path, if it wasn't provided on the cli
For each unmanaged dependency, ask if it should be put in with goget or locked to ref . There could be cli options to either auto-goget or auto-lock everything
At the end, output the generated .go.yaml to stdout, unless the cli flag to auto-write it is given. I'm not totally sold on this step, because if the user decides to bail on the one they generated then they are left with a .goat/deps full of dependencies they maybe don't actually want. So maybe fill should pull dependencies into a different directory in .goat? Again, not sold.

I think that covers everyone's use-cases. I like the idea of the interactive prompt (as long as it can be bypassed with cli flags if needed) because:

It's easy for noobs, we can add explanations and help where needed, as well as check their inputs for sanity if we feel really spry
We can change the prompts all day and not worry about backwards compatibility. We can never break backwards compat for .go.yaml, and should really really avoid it for the cli, so this is valuable.
It lets the user think about each dep individually if they need/want to

So let me know what you guys think, but that's the best solution I have so-far.

tummychow commented 10 years ago

Still think freeze should be deferred to a future discussion. This one is clearly contentious enough as it is... No further comment on the subject from me.

I've got no interest in the interactive CLI part of it, but as long as I can override it, it won't get in my way. I'd personally prefer to tie interactive to a flag and have the non-interactive side be the default behavior, but there's no point getting divisive over minor CLI arguments. It's all the same to me.

As for the issue of not overwriting .go.yaml, I think it's obvious that every time we try to avoid overwriting the existing contents, the problem gets significantly uglier. Install deps into a different directory temporarily unless the user agrees to overwrite their file? Append the new deps to the end of .go.yaml via a custom YAML generator? Write a comments-aware YAML parser? All these "solutions" are functionally incorrect, expensive to maintain, hard to justify, and introduce all kinds of dumb edge cases. Right now goat is simple enough that I can get over its lack of tests... introduce any of these behaviors and that will rapidly go out the window.

Do the simplest thing that can possibly work: overwrite the file every time. The command is supposed to modify .go.yaml programmatically; I don't see how you could expect comments to be preserved when the thing is going to be computer generated.

And if we're going to go with the interactive-by-default approach, we can just jam a warning into the start of the command to let you know that your existing .go.yaml will be rewritten. That way nobody should be surprised.

mediocregopher commented 10 years ago

The command is supposed to modify .go.yaml programmatically

Not really. The command is supposed to generate .go.yaml programatically. What it does with that generated file is yet to be determined. But I'd like to keep behavior consistent with the other go tools, like fmt and vet, which don't write anything to disk by default, just dump to stdout (with the option to write on the cli).

Goat's lack of tests is definitely something which needs to be addressed, probably before this is.

That being said, after talking with a friend of mine who also uses goat regularly, he pointed out that we'd have to put deps that are being pulled in by fill in a temporary directory regardless of whether .go.yaml gets written in the end, since the user could potentially ctrl-C in the middle, leaving their environment "dirty". So with that in mind, I think my idea to ask the user whether to dump to stdout or write would be the way to go, since we have to do the temporary directory thing no matter what.