Reduce data transfer between agent and scheduler

jvehent commented 8 years ago

As we start running very large actions for vulnerability management, it's becoming important to reduce the amount of data that goes through the messaging pipeline. This can be done in two places:

compress commands at the scheduler level, before sending them into rabbitmq. This should be done right before publishing them into rabbitmq. On the agent side, we should look for a magic byte when parsing commands and decompress if necessary. The logic should avoid compressing commands that are already small enough (eg, <5kB).
prune action parameters when sending command results. This need to be implemented in the agent, prior to sending the results. We just need to check that it doesn't break anything on the scheduler side (but it shouldn't).

Compression of command results could be implemented as well, similarly to 1.

ameihm0912 commented 8 years ago

Part of this was addressed in #176

Removing action parameters from responses still needs to be addressed.

VitorFalcao commented 7 years ago

Hello, I am just starting to contribute to the project. I would like to know what exactly are this "action parameters". Is it the field "Action" in the Command struct here? And what would be the best approach to remove it, please?

ameihm0912 commented 7 years ago

That's the correct place to look.

Basically what happens is, when a new command comes into the agent to be run, it eventually ends up here:

https://github.com/mozilla/mig/blob/3f053d476aacbdfefb16823e1b745a75121ca703/mig-agent/agent.go#L596

and msg is turned into a mig.Command:

https://github.com/mozilla/mig/blob/3f053d476aacbdfefb16823e1b745a75121ca703/mig-agent/agent.go#L619-L624

The command Action element is what the agent should do, and it can do more than one thing per action so within here you will see a list of operations, where each operation describes the parameters that should be used when the module is run for that operation.

https://github.com/mozilla/mig/blob/3f053d476aacbdfefb16823e1b745a75121ca703/action.go#L39

Inside a given operation, Parameters is the data we are referring to here that we want to remove in the response.

https://github.com/mozilla/mig/blob/3f053d476aacbdfefb16823e1b745a75121ca703/action.go#L85

Once the agent is done executing modules for the operations, the results are added back to the same command type we started with at the beginning.

https://github.com/mozilla/mig/blob/3f053d476aacbdfefb16823e1b745a75121ca703/mig-agent/agent.go#L856

and then the completed command is sent to the Results channel, where the agent will forward it back to the scheduler.

https://github.com/mozilla/mig/blob/3f053d476aacbdfefb16823e1b745a75121ca703/mig-agent/agent.go#L868-L869

The issue here is that command we are sending to the channel, still contains all of the parameters that were supplied to run the module. In some cases, this can be quite a bit of data (like parameters for the scribe module). We want to just remove that since we only care about the results coming back and don't need the original action parameters.

At first glance I am thinking one way to do this would be to add a new method on the Command type, maybe something like PruneActionParameters. This would essentially iterate over all operations in Command.Action, and remove them.

So in the previous highlighted statement, it might end up looking something like:

err = cmd.PruneActionParameters()
if err != nil {
    panic(err)
}
ctx.Channels.Results <- cmd

ameihm0912 commented 7 years ago

Just to clarify my original comment to, we wouldn't want to remove the operations, but what we'd want to do is likely just set the Parameters value in the operation to nil.

VitorFalcao commented 7 years ago

@ameihm0912 Thank you very much for the clarification. Working on the fix.

VitorFalcao commented 7 years ago

@ameihm0912 I just have 3 doubts:

Should I change the "command.go" file ate the project's root? Because the agent looks for the Command type in mig.ninja
What would be an error in the case of PruneActionParameters?
Should I write a test?

Thank you

mozilla / mig

Reduce data transfer between agent and scheduler #170