Extract and rework commands package

keks commented 7 years ago

This PR rips out commands/, changes a lot of stuff and puts it in go-ipfs-cmds. This PR is WIP and primarily here for reviewing.

see #3524 and go-ipfs-cmds

Some remarks:

Backwards compatibility

We don't want to rewrite all of core/commands at once, so I built a shimming layer that allows us to use the current go-ipfs/commands.Command structs (with some limitations). All the shimming code is in go-ipfs-cmds/legacy.go. That is a lot of code and in the medium term it would be great to dump it. For that we need to use the new commands library in all of core/commands. That is not a short-term project though.

Shared code

There is quite a lot of shared code between go-ipfs/commands and go-ipfs-cmds. To reduce the number of type conversions, I moved most of it to go-ipfs-cmds/cmdsutil. I don't think this is a very good name though and it might change in the future. $pkg/${pkg}util usually contains code that operates on types in $pkg and here it's the other way around: cmdsutil contains a lot of basic tools (e.g. an error type) that is used by both go-ipfs-cmds and go-ipfs/commands. Maybe go-ipfs-cmds/core is better?

Basic model

Most changes I made affected the type Response, which I split up into Response and ResponseEmitter. Basically a producer is given a ResponseEmitter on which it can call Emit(v) a few times (basically it's a channel write) - or just once if v is an io.Reader. These values can be received by a consumer which has the corresponding Response by calling v, err := res.Next(). These can be chained at will - which actually happens when there is a PostRun.

Marshalers, Encoders, PostRun

When I started this project I complained about PostRun. I still think it's a bit weird, but it fulfills an important use case. In go-ipfs/commands, the Marshal function takes the value passed to SetOutput and do it's magic to build a bytestream from that. In go-ipfs-cmds there is no singular value, instead we operate on streams. Encode() is very similar, but instead it operates on a single emitted value. It is called once per call to Emit. This means that no state is kept between emitted values. If you want to e.g. not abort on errors and print an error digest after everything completed, you need to use PostRun. PostRun takes a request and the actual ResponseEmitter and returns a new ResponseEmitter. Usually the first thing that happens in a PostRun is to create a ResponseEmitter/Response pair backed by a channel. In a goroutine we read from that Response, do our thing, and send it to the underlying ResponseEmitter. After calling the goroutine we return the channel-backed ResponseEmitter. Usually the call to PostRun looks like this:

re = cmd.PostRun[enc](req, re)
// ...
cmd.Run(req, re)

That way the returned ResponseEmitter will be used by the Run function and all emitted values will pass through PostRun before ending up at the final destination (usually a cmds/cli.ResponseEmitter). This allows building flexible pipelines.

Changes to the sharness tests

I changed some small stuff in the sharness tests. I don't intend to keep these changes, but they are currently handy. The changes basically include a lot of debug output and a few disabled tests, because my computer doesn't like tests that use nc. I don't know why, but they also fail for master. Maybe you can try to run them? That is t0060, t0061 and t0235. Remove the test_done from the test's preamble.

I also wasn't able to test fuse and docker.

That was a lot of explanation. Let me know what you think!

Kubuxu commented 7 years ago

Also go-ipfs-cmds has to be gxify'ed, Codeclimate and CI should pass. I think you should be able to dynamically add signoffs to those comments. This might be requirement.

keks commented 7 years ago

Looking really good. I left a few comments, mostly will want all the excess debug statements and commented out things removed before merge. Also i'm very skeptical of any changes to the sharness tests.

Thanks! Yes, sharness changes are temporary so I see why the test fails and to skip those that don't work on my machine

Also go-ipfs-cmds has to be gxify'ed, Codeclimate and CI should pass. I think you should be able to dynamically add signoffs to those comments. This might be requirement.

Of course! Once it is gxified CI should be at least able to test it. And I'll stir those commits again anyway, in the end there will be a clean signed off and licensed set of commits. Maybe I'll even squash it down to one commit so there are no commits that don't pass the tests. What is your preference on this?

Kubuxu commented 7 years ago

If there are commits that do not pass, better squash them.

whyrusleeping commented 7 years ago

@keks could you either sign those commits off or squash them? The gitcop notifications are murdering me

keks commented 7 years ago

Yeah sorry. I wanted to squash later and going through all of them is also a hassle. I'll just squash early.

Btw it seems I surfaced the race condition problem. Not sure yet how to tackle that one when it only fires on CI but not locally.

keks commented 7 years ago

ping @whyrusleeping @Kubuxu @lgierth @jbenet I think this is ready for shipping. The remaining CodeClimate stuff is from old code I moved around.

whyrusleeping commented 7 years ago

Oh crap, @Kubuxu we probably shouldnt have merged that dep update. Its gonna make this one annoying to fix merge conflicts on...

The best way to resolve this is probably to make a WIP commit here with the imports set to dvcs paths, then make a WIP commit locally on master with the same. Then the merge should be simple enough to do. Merge WIP-master into this branch, and the rewrite back to gx paths.

I think thats the easiest way to do this now... unless you have a better idea @Kubuxu ? It could be that just rebasing would be easy enough

ghost commented 7 years ago

No more long lived feature branches! :)

Kubuxu commented 7 years ago

I have most of the merge done. I just need the deps introduced by this PR to have multihash and other deps consistent with master. @keks can you do that for me?

@whyrusleeping in short what I've done is:

gx-go uw
sed -i 's|github.com/ipfs/go-ipld-node|github.com/ipfs/go-ipld-format|' **/*.go
git checkout master package.json
# modify package.json to include both PRs
gx-go rw
git merge -s ours master

The last step doesn't work well as there are two versions of multihash and others but it should work as what I essentially do is recreate the other PR. Other option would be reverting it and doing it once over.

keks commented 7 years ago

Just pushed.

I think something is wrong with go-ipfs-format/go-ipfs-node. The import path (both name and hash) is changed in all the .go files, but in package.json only the name is changed, not the hash. I fixed that in my commit.

keks commented 7 years ago

hang on, something's wrong. sigh

keks commented 7 years ago

I had to rip out go-ipfs-cmds/cmdsutil to go-ipfs-cmdkit. Should I just push that go github.com/ipfs/go-ipfs-cmdkit?

Kubuxu commented 7 years ago

@keks why did you have to do that?

keks commented 7 years ago

@Kubuxu

why did you have to do that?

go-ipfs-cmds/cmdsutil was imported by go-ipfs/commands and go-ipfs-cmds. gx/ipfs/$hash/go-ipfs-cmds cannot import gx/ipfs/$hash/go-ipfs-cmds/cmdsutil because I onlt get the hash i need to import after the fact. The alternative would be to import an older version, but then go-ipfs would need to import gx/ipfs/$newHash/go-ipfs-cmds and gx/ipfs/$oldHash/go-ipfs-cmds/cmdsutil which is stupid.

I also tried using the relative import path ./cmdsutil in go-ipfs-cmds, but apparently this is (a) bad practice and (b) not supported anymore.

So I moved go-ipfs-cmds/cmdsutil to go-ipfs-cmdkit. But it's not on github yet.

Kubuxu commented 7 years ago

go-ipfs-cmds/cmdsutil was imported by go-ipfs/commands and go-ipfs-cmds.

This was enough. :D

whyrusleeping commented 7 years ago

Hey @keks I wrote a thing to help you out. To fix gx imports, install the latest master of gx-go and run gx rewrite --fix. It should more intelligently rewrite all deps (even ones no longer imported) back to their github imports, at which point running a normal gx-go rewrite should be using all the correct deps.

whyrusleeping commented 7 years ago

Let me know if you have any issues with that, or if it doesnt fix the issue (it probably won't be able to fix the node -> format rename, but for everything else it should do fine)

keks commented 7 years ago

@whyrusleeping I already fixed everything manually :) The merge doesn't work though, but I understood @Kubuxu wanted to do that.

keks commented 7 years ago

asking again:

I had to rip out go-ipfs-cmds/cmdsutil to go-ipfs-cmdkit. Should I just push that go github.com/ipfs/go-ipfs-cmdkit?

Kubuxu commented 7 years ago

Seems like a good option.

keks commented 7 years ago

https://github.com/ipfs/go-ipfs-cmdkit/

Should I just publish or does anyone want to take another look? I have a readme and license on it and the tests pass.

whyrusleeping commented 7 years ago

@keks that looks good to me, go ahead and publish it (and make it a non-private repo while youre at it :) )

keks commented 7 years ago

cmdkit is now public.

kevina commented 7 years ago

I am still trying to digest this so please bear with me.

Is there standard way to output to both stdout and stderr now? I would like to fix the marshal function to return two streams at very least if this is not already done.

I would also like to standardize the described use case for post-run, streaming errors as they come is a very common thing for unix commands to do. A good unix command will not abort unless there is a fatal error or it is unsafe to continue. It seams to me the the marshel function can be expanded to support this also so we don't have to use post-run all the time. In the future it is my hope that more ipfs commands will behave like "block rm" and I would hate for them all to have to use PostRun.

@whyrusleeping (and others) thoughts on this.

whyrusleeping commented 7 years ago

This changeset LGTM. I've reviewed it twice now, and all the sharness tests seem to pass without being changed, so thats a very good signal.

I think what we will want to do is wait until the 0.4.9 release (i'm trying to do this very soon), then rebase this on top, fix up the dependencies (so it doesnt have 248 changed files) and then go ahead and merge this out.

In the meantime, getting any more review from @Kubuxu @kevina @lgierth and @hsanjuan if they have time would be appreciated.

kevina commented 7 years ago

@whyrusleeping I left some comments, is this something (that I commented on) you are concerned about? Is this something that can be done later, or just not worth it, etc.

whyrusleeping commented 7 years ago

@kevina I think that with this changeset, postrun can be deprecated in favor of using emit to output error messages. Unsure on exactly how this would work right now though. Thoughts @keks ?

kevina commented 7 years ago

@whyrusleeping that is not how @keks is using postrun, see his comment in the description. Around where he said:

If you want to e.g. not abort on errors and print an error digest after everything completed, you need to use PostRun.

He also modified ipfs block rm to use postrun when before I was using the marshaller as we discussed.

More to the point both Emit and PostRun is used when you want to stream errors as they happen.

I am still trying to understand all this but right now it seams an overcomplicated way to do things.

keks commented 7 years ago

@kevina Thanks for the feedback!

Is there standard way to output to both stdout and stderr now? I would like to fix the marshal function to return two streams at very least if this is not already done.

Hmm, when I started working on this I collected things people were concerned about and I think this was not one of them. I'm a bit confused this is being brought up after it's done.

I would also like to standardize the described use case for post-run, streaming errors as they come is a very common thing for unix commands to do. A good unix command will not abort unless there is a fatal error or it is unsafe to continue. It seams to me the the marshel function can be expanded to support this also so we don't have to use post-run all the time. In the future it is my hope that more ipfs commands will behave like "block rm" and I would hate for them all to have to use PostRun.

Well... Currently Emit is about moving response values along the data path to a function that renders it. There are two ways to do that: either you use an Encoder or you use PostRun. Encoders are instantiated using an io.Writer and are passed individual values. They don't keep state over the course of a command run. The alternative is PostRun, which is a filter: it takes an underlying ResponseEmitter as argument and returns a new one. Code used later on just uses the returned emitter.

Maybe the least intrusive change is to add a Renderer type and Commands field, which is a func(Response) and keeps reading from res.Next(). If cmd.PostRun == nil && cmd.Renderer != nil we can construct a PostRun from the renderer and use that one. This should work but I'm at the airport and didn't have a lot of sleep so I'll think about it again later.

@whyrusleeping I left some comments, is this something (that I commented on) you are concerned about? Is this something that can be done later, or just not worth it, etc.

Do you mean the comments I replied to above or comments to concrete lines of source code? Because I can't find those anywhere. Maybe you forgot to submit the review?

He also modified ipfs block rm to use postrun when before I was using the marshaller as we discussed.

Yes, marshalers don't exist anymore, only encoders. And they only process one value at a time.

@whyrusleeping

@kevina I think that with this changeset, postrun can be deprecated in favor of using emit to output error messages. Unsure on exactly how this would work right now though. Thoughts @keks ?

No, @kevina is right.

kevina commented 7 years ago

Is there standard way to output to both stdout and stderr now? I would like to fix the marshal function to return two streams at very least if this is not already done.

Hmm, when I started working on this I collected things people were concerned about and I think this was not one of them. I'm a bit confused this is being brought up after it's done.

Partly because I was thinking out loud, and partly because I have a very hard time keeping up with all that goes on with IPFS. I probably should of been more involved earlier on, I apologize for that. Although, it would of been helpful if I was mentioned in the original issue if my early feedback was wanted.

kevina commented 7 years ago

Do you mean the comments I replied to above or comments to concrete lines of source code?

I meant this comment: https://github.com/ipfs/go-ipfs/pull/3856#issuecomment-298251797 the one that starts with "I am still trying to digest this..."

kevina commented 7 years ago

@keks, is there a reason SetError returns an error. How can it fail. In my option, code like this:

if err != nil {
    err2 := re.SetError(err, cmdsutil.ErrNormal)
    if err2 != nil {
        log.Error(err)
    }
    return
}

is rather excessive and will be a bit tedious to have to write all the time.

As far as I can tell, in the original proposal SetError didn't return an error. If the reason for the change is documented somewhere, I apologize as I must of missed it.

whyrusleeping commented 7 years ago

@keks It seems like the encoder could detect if it was given an error and print that to stderr instead of stdout.

keks commented 7 years ago

@kevina

Although, it would of been helpful if I was mentioned in the original issue if my early feedback was wanted.

That makes sense, yeah.

is there a reason SetError returns an error. How can it fail.

SetError simply Emits an *cmdsutil.Error (formerly *commands.Error), and Emit returns an error. Before the rewrite cmd.Run didn't send anything over the network, it just prepared the output data, so handling the errors was up to the code copying the prepared response to e.g. the http connection. I agree it's tedious and messy, but the only alternative I see is to panic and then recovering after Run is left. What do you think about this @kevina @whyrusleeping?

@whyrusleeping

It seems like the encoder could detect if it was given an error and print that to stderr instead of stdout.

Well, currently if cli.ResponseEmitter receives an error, it passes it to it's creator in main() so it can close the program. In many cases this is the desired behaviour. If we instead just print the error but happily go executing, many commands will break. That's why I suggested adding a special type of error type (either a go type or sth like ErrClient: ErrNonFatal) and if cli.ResponseEmitter sees that one it does what you described but keeps in mind that errors have been thrown and it passes a non-zero error to main() after the command finished.

keks commented 7 years ago

Oh and also @kevina

Also either make use of ProcRmOutput or remove that function to avoid dead code.

yes, after this is merged or at least completely decided on.

kevina commented 7 years ago

@keks wrote:

SetError simply Emits an cmdsutil.Error (formerly commands.Error), and Emit returns an error. Before the rewrite cmd.Run didn't send anything over the network, it just prepared the output data, so handling the errors was up to the code copying the prepared response to e.g. the http connection. I agree it's tedious and messy, but the only alternative I see is to panic and then recovering after Run is left. What do you think about this @kevina @whyrusleeping?

Then could SetError do the necessary error handling itself? If required it can also take a log object so it can output to the correct log. Basically just move the:

if err2 != nil {
    log.Error(err)
}

to inside SetError so we don't have to repeat the same code again and again.

keks commented 7 years ago

I like putting the logging into SetError, but I'd still return the error. Maybe at some point there will be a function that is interested in the returned error. All other Runs can just drop it.

kevina commented 7 years ago

I like putting the logging into SetError, but I'd still return the error. Maybe at some point there will be a function that is interested in the returned error. All other Runs can just drop it.

I suppose, returning an error will be okay if it is clearly documented that it is okay to drop it. Or, if we are interested in the Error, we could just call Emit directly if we are interested in the error. @whyrusleeping thoughts.

whyrusleeping commented 7 years ago

I'm :+1: on having emit return an error. That should help make problems more apparent to the user (instead of being randomly logged in the commands lib somewhere)

kevina commented 7 years ago

@whyrusleeping so that we are clear the issue is not Emit returning an error, but SetError (that just calls Emit). I don't like having to write this code, just to return an error from a command:

if err != nil {
    err2 := re.SetError(err, cmdsutil.ErrNormal)
    if err2 != nil {
        log.Error(err)
    }
    return
}

When before all we had to do was.

if err != nil {
    re.SetError(err, cmdsutil.ErrNormal)
    return
}

What I propose instead was have SetError take the logger so we would do

if err != nil {
    re.SetError(err, cmdsutil.ErrNormal, log)
    return
}

What @keks proposed was do that but have SetError still return the error but allow the user to ignore it. I didn't think that was a good API decision (but I don' fell strongly on this) What I propose was that if the user wanted to make absolutely sure the error was returned then they could do

if err != nil {
    err2 := re.Emit(&cmdsutil.Error{Message: err.Error(), Code: cmdsutil.ErrNormal})
    if err2 != nil {
        // do something about it other then just logging, not sure what though
    }
}

that is just call Emit directly so they have more control on the error handling.

On a related note I think SetError is missed name now, I would rename it to Fatal or Fail or something to be clear that we are aborting do to the error. Before I believe you could theoretically call SetError at any time, and now I believe it is the last thing that needs to be done.

keks commented 7 years ago

I suppose, returning an error will be okay if it is clearly documented that it is okay to drop it. Or, if we are interested in the Error, we could just call Emit directly if we are interested in the error. @whyrusleeping thoughts.

How about we say "it's okay to drop it if you return afterwards anyway"? Because if you keep doing stuff, you operate on unexpected state which is problematic.

What I propose instead was have SetError take the logger so we would do ...

Currently we use package-level logging, so there isn't really need to pass it. If we start introducing a logger that is passed around I'd just put it into the ResponseEmitter because this error should be logged on the request-scoped logger. So I wouldn't change the function's signature to accommodate for a logger.

What @keks proposed was do that but have SetError still return the error but allow the user to ignore it. I didn't think that was a good API decision

But just dropping the error isn't better - it just hides the ugliness. Also, do you still feel the same about it if we say "it's okay to drop the error if and only if you return afterwards"?

On a related note I think SetError is missed name now, I would rename it to Fatal or Fail or something to be clear that we are aborting do to the error.

Hmm, fair point. I would wait with a decision on that until we have the semantics for non-fatal errors worked out, though.

@kevina @whyrusleeping Do you think we can do a high-thoughput video chat today or tomorrow? This timezone ping-pong seems a bit inefficient to me ;)

kevina commented 7 years ago

@keks

How about we say "it's okay to drop it if you return afterwards anyway"? Because if you keep doing stuff, you operate on unexpected state which is problematic.

That could work, although this seams a little more complicated than it needs to be.

So I wouldn't change the function's signature to accommodate for a logger.

OK

keks commented 7 years ago

That could work, although this seams a little more complicated than it needs to be.

I think the only safe and simpler option is calling panic inside SetError if an error occurs and recover in Command.Call.

kevina commented 7 years ago

I think the only safe and simpler option is calling panic inside SetError if an error occurs and recover in Command.Call.

I suppose that can work. How can Command.Call recover?

keks commented 7 years ago

cmd.Call calls cmd.Run which calls SetError. It can just defer func(){v := recover(); ...}() before calling Run.

kevina commented 7 years ago

@keks I meant what can it do to recover? If Emit returns an error that likely means there is a problem with the connection. After you recover is there anything useful to do other than logging the error?

whyrusleeping commented 7 years ago

I think if emit fails, the panic --> recover strategy makes sense. That way we can cleanly print out connection errors to the user across the board

kevina commented 7 years ago

@whyrusleeping I was primary concerned with SetError, not Emit (SetError calls Emit). Are you saying that you think Emit itself should not return an error and instead Panic if it can't send the message?

kevina commented 7 years ago

@keks if a meeting becomes necessary/helpful I think sometime Monday after the standard calls would work better for me.

keks commented 7 years ago

If we go that way we need to make sure all cleanup is done in defer()s. Code like

mutex.Lock()
SetError("error!", cmdsutil.ErrNormal)
mutex.Unlock()

...

would not be safe because the mutex is not guaranteed to be unlocked. Since we want to close the mutex directly after calling SetError, we would need to do

func() {
  mutex.Lock()
  defer mutex.Unlock()
  SetError("error!", cmdsutil.ErrNormal)
}()

...

I think this could be a source of subtle bugs if we go down the panic/recover route. And mutexes are just an example - I expect there to be more complex issues like this.

ipfs / kubo