tcpexmachina / remy

403 stars 78 forks source link

Changing Input to be Vector of Points Not Range #34

Closed deeptir18 closed 8 years ago

deeptir18 commented 8 years ago

-Allows us to input in sets of networks to Remy that might not specifically follow the input range object (or if the set of networks need to have some other structure) -Modified the configuration tool and inspect tool to work with this change -Modified rat-runner and scoring rat to use protobufs (same protobuf format that is an input into Remy) instead of using hard coded or command line arguments

(In the earlier pull request I closed, I had forgotten to merge with master).

czlee commented 8 years ago

A few questions/notes:

  1. My plotting scripts currently rely on being able to pass in arbitrary settings to rat-runner via the command line. I could rewrite the scripts to regenerate the config file for every point on the graph, but this seems cumbersome. I might be mistaken, but I thought rat-runner's purpose was to test single networks. Did we specifically want to generalize this? Unless I'm mistaken, without further modification to rat-runner, I can't just put all of the networks I want evaluated into a single config file—rat-runner evaluates a single score for all networks specified.
  2. There shouldn't really be any need to change inspect-configrange.cc, other than to change the type declaration on the old line 24 (and maybe update the name of the file). It relied on the DebugString() intentionally, the idea being that it'll print out whatever it has (as a debugging tool) rather than assume it conforms to Remy's internal specification.
  3. This presumably means that in current use-cases there will be a lot of redundant information in the configuration file, right? What's the impact on the size of the files, or how cumbersome would it be to support both ranges and vectors?
  4. I assume ConfigRange is just being kept around to allow the conversion of old config files to new ones. If so, can we make input-configrange.cc accept the new style of config files as well, for the same purpose (being able to make slight modifications to existing configs)?
deeptir18 commented 8 years ago
  1. I wasn't sure if in the future we ever wanted to use rat-runner to score on multiple networks. This is why I let it have the same input as remy. For your plotting scripts, maybe you can just input a protobuf that has one network, and plot with those? Or we can go back to command line options - whatever is best.
  2. Sure, that could be simpler for inspect config if it's just a debug tool! I didn't realize DebugString() didn't depend on the internal specification.
  3. I just kept in the Range object as I wasn't sure who might be using them. I'm not sure about the impact of the sizes of the files. The idea is that if the user can specify the same arguments to_ _inspect-configrange.cc__ now, and instead of producing a protobuf that specifically encodes a range of networks - it is just a vector of networks. You can also create different tools (instead of inspect-configrange) to create these protobufs if your set of networks to train on doesn't follow the configrange object.
  4. We can probably eventually delete configrange.hh - no one is using it - we would just need to modify inspect-configrange a little.

Let me know if you have more questions!

czlee commented 8 years ago

Okay, thanks.

On (1): Unless we know that we want to run rat-runner to evaluate over a collection of network configurations, it seems simpler to me to leave functionality as it was, i.e. for rat-runner to generate a ConfigVector containing just the one configuration it was given in command line options. A script can be written to do anything if I'm sufficiently determined, but on the question of what's better: To me the benefit of (the added complexity of) configuration files is being able to run the same complex configuration without relying on command history, or to switch between multiple complex configurations repeatedly. This benefit doesn't really apply when it's a script that's running lots of different configurations, or when a human's looking to run just one network.

On (3): Sorry, when I said "support both ranges and vectors" I meant for Remy to do the work of interpreting both. The argument for it is just convenience—config files would be smaller and easier to modify if, e.g., you only cared about min/max/incr and you wanted to keep everything as-is except to widen the range of link speeds. The argument against it is code complexity in Remy, since we'd then need to handle both cases there (as opposed to handling both cases elsewhere).

anirudhSK commented 8 years ago

(1) I agree with @czlee. I think it's useful to pass in only a single network to rat-runner. This could be a ConfigVector protobuf with exactly one entry, and rat-runner should throw an exception if there is more than one entry in the supplied ConfigVector. Also, we should probably remove scoring-example; rat-runner serves the same purpose.

(2) My preference would be to support vectors alone, so that the remy code base is simpler. All the complexity can be in the tools that generate these configuration vectors. You can probably write these tools in Python or something else more convenient as and when you need to generate specific configuration vectors for specific experiments.

@keithw : Any preferences?

keithw commented 8 years ago

I think I agree with you, Anirudh.

deeptir18 commented 8 years ago

When I can in the next week, I'll modify the code to do the following, and redo a pull request: (1) only support the vectors (2) change rat-runner to just take in a configvector of length 1 - and have it throw an error if it's not (3) remove scoring-example (4) Right now we can keep a c++ executable file (input-configrange.cc) to be the tool to generate the input config vector - but as we need them, we can probably add python scripts.

Keith, does this sound good?

czlee commented 8 years ago

If rat-runner is only really about a single network at a time, it seems cumbersome to have to regenerate a new config file every time you want to test a new network—it would then be two commands instead of one. Can we keep the command line options there, as they were originally? It can support both options and files if there is a good use-case for using config files instead.

deeptir18 commented 8 years ago

I'm good with that! @keithw just let us know what I listed in my above comment ( and using command line options for rat-runner) is fine - and I'll go ahead and make a new pull request.

anirudhSK commented 8 years ago

We can have both cmdline options and config files in rat-runner for now, but we'll probably want to remove it eventually, especially if the network configuration vector grows (such as supporting more complicated traffic workloads).

@deeptir18: Go ahead with your changes when you find the time and one of us can look at it before merging.

keithw commented 8 years ago

@deeptir18, I think your first suggestion ("When I can in the next week...") is the right path. It seems to me that these programs should all strive to do just one thing. If rat-runner is going to take a protobuf, that's the only way it should be able to be run (no command-line options). And, in fact, every program should just support one way of running it.

If our scripts have to use the configuration helper that takes the command-line options to produce that protobuf, that doesn't seem very difficult to arrange. (We should fix the UI to make it easier to specify a single-network protobuf in this program.)

I don't really care if rat-runner throws an error if its input protobuf has more than one network. or not. It could just run on every network in the vector.

keithw commented 8 years ago

(Also, please try to avoid including merge commits in the pull request.)

deeptir18 commented 8 years ago

@keithw Sounds good, will do sometime in the next few days - and sorry, didn't realize about not including merging.

czlee commented 8 years ago

I just realized this will conflict with #31, which adds more fields to ConfigRange (namely, how long the simulation will run for).

That needn't by itself prevent merging this, this is mainly a heads-up for what might come after it, especially with a view to backwards compatibility with existing config files. I think we'll need to extend what is currently ConfigVector into a more general class that encapsulates all parameters that may be passed to an evaluator. I explain over at #31 but the gist of it is that some new EvaluatorParameters message would contain a ConfigVector message alongside other evaluator parameters. The salient note is that the top-level message of a config file would change (again).

deeptir18 commented 8 years ago

Hey all, I just wanted to confirm the changes I am going to make (so it doesn't conflict with #31 above): This should be done by tomorrow.

  1. Modify the ConfigRange object to contain: a vector of points (instead of ranges of the network variables), the simulation ticks object (and we can add more parameters later if need be)
  2. -Modify the protobuf accordinglly
  3. -Modify the input-configrange.cc to make a protobuf that fits with this format
  4. -Make sure "sender-runner" (the replacement for rat-runner) can work with this format
  5. -Make sure "remy" will parse the new input format correctly
  6. -Modify the evaluator to work with this range of points accordingly

**I'm not going to rename the file which makes things simpler, unless you want me to.

czlee commented 8 years ago

I did my rebase today and will have a pull request in tonight, if you haven't already started 😕

deeptir18 commented 8 years ago

What does your pull request have?

czlee commented 8 years ago

It'll be my master branch, https://github.com/czlee/remy/tree/master. I'm just in a class now, but when I get home I'll run some tests to make sure it's all in order then submit.

keithw commented 8 years ago

@deeptir18 , that still sounds reasonable to me. If sender-runner is going to ingest this protobuf, I would like it to only ingest this protobuf. Right now the tests are running sender-runner with the command-line arguments; you'll have to teach them to run configuration to generate the scenario and then feed that to sender-runner.

deeptir18 commented 8 years ago

@keithw - so you would like then sender-runner to have command line arguments? Or have it run like remy with an input protobuf? Or not? I'm confused specifically about "teach[ing] them to run configuration to generate the scenario and then feed that to sender-runner" - what is the point of having them run configuration internally if they already have the information about the config from command line?

Sorry for all the questions, just want to make sure I understand everything correctly.

keithw commented 8 years ago

My suggestion: just make the decision one way or the other. Either sender-runner takes command-line arguments, or it takes a protobuf like remy. Honestly, I don't really care either way, but I think it's simplest to have it just do one thing.

If you're going to change sender-runner to take a protobuf, then the tests will also need to be updated (since they run sender-runner). That's all I'm saying.