opennetworkinglab / flowvisor

FlowVisor - A network hypervisor
Other
164 stars 67 forks source link

Flow stats reply sent by FlowVisor doesn't set the "more replies" flag #244

Closed gth828r closed 11 years ago

gth828r commented 11 years ago

In the past, I have used the Floodlight REST API to query for flow stats, and that has always returned the flow stats as expected. We recently upgraded FV to 1.0.8 in our lab, and as a result, I am no longer seeing any flow stats replies returned via the floodlight REST API, even if the flows fall within my slice's flowspace.

I have noticed that flowvisor sends empty flow stats replies for flows that don't fall within my slice, and it sends a normal stats reply for flows that do fall within my slice. However, I noticed that the "flags" for the ofp_stats_reply is always set to 0, including on the empty replies, and the xid for all of the stats_replies is the same.

According to the spec:

The only value defi ned for flags in a reply is whether more replies will follow this one - this has the value 0x0001. To ease implementation, the switch is allowed to send replies with no additional entries. However, it must always send another reply following a message with the more flag set. The transaction ids (xid) of replies must always match the request that prompted them.

I think the current behavior of always having the flags set to 0 leads to floodlight dropping all of the flow stats replies with the initial xid following the first one.

alshabib commented 11 years ago

I am trying to understand your issue. Did you see the correct behaviour in an FV version prior to 1.0.8 but post 1.0? The correct behaviour is that the flow stats are sliced according to your flowspace.

Are your switches sending flowstat replies with the flags set to 1? If yes, could you provide a trace for this?

In the past, I have used the Floodlight REST API to query for flow stats, and that has always returned the flow stats as expected. We recently upgraded FV to 1.0.8 in our lab, and as a result, I am no longer seeing any flow stats replies returned via the floodlight REST API, even if the flows fall within my slice's flowspace.

I have noticed that flowvisor sends empty flow stats replies for flows that don't fall within my slice, and it sends a normal stats reply for flows that do fall within my slice. However, I noticed that the "flags" for the ofp_stats_reply is always set to 0, including on the empty replies, and the xid for all of the stats_replies is the same.

According to the spec:

The only value defined for flags in a reply is whether more replies will follow this one - this has the value 0x0001. To ease implementation, the switch is allowed to send replies with no additional entries. However, it must always send another reply following a message with the more flag set. The transaction ids (xid) of replies must always match the request that prompted them.

I think the current behavior of always having the flags set to 0 leads to floodlight dropping all of the flow stats replies with the initial xid following the first one.

— Reply to this email directly or view it on GitHub.

gth828r commented 11 years ago

alshabib wrote:

Did you see the correct behaviour in an FV version prior to 1.0.8 but post 1.0?

Yes. The behavior I saw when using 0.8.17 was that Floodlight would display all of the flows (including the ones in my flowspace). I understand that has changed to so that FlowVisor should only pass along flows that fall within my flowspace. I think that is working properly (as evidenced by packet captures), but the problem is that Floodlight does not display any flows now, even if they are in my flowspace.

alshabib wrote:

Are your switches sending flowstat replies with the flags set to 1? If yes, could you provide a trace for this?

When I query for flow stats via FlowVisor 0.8.17, the flags for each stats reply is set to 1 up until the last stats reply from a given datapath for a given transaction (with a single xid), where it is set to 0. If I understand the spec correctly, that is the expected behavior. With FV 1.0.8-1, all stats replies from a given datapath for a given transaction (with a single xid) have the flags value set to 0.

I have some packet captures and some files showing the output of the floodlight REST API for a switch that is being queried via an FV 0.8.17 instance and for a switch that is being queried via an FV 1.0.8-1 instance to show the behavior I am seeing, and what it looks like in terms of the flags. It looks like I can't attach those here, so I'll send them out of band. For decoding the capture files, note that my OF controller is running on port 33012.

gth828r commented 11 years ago

For the files, please see:

gth828r commented 11 years ago

Any word on this? Any thoughts on how hard this would be to fix? Would it be possible to put it into a 1.0 release?

alshabib commented 11 years ago

Sumanth is working on it know. We'll include it into a 1.0.9 release.

Ali Al-Shabibi

On Jun 13, 2013, at 8:51, Tim Upthegrove notifications@github.com wrote:

Any word on this? Any thoughts on how hard this would be to fix? Would it be possible to put it into a 1.0 release?

— Reply to this email directly or view it on GitHubhttps://github.com/OPENNETWORKINGLAB/flowvisor/issues/244#issuecomment-19400901 .

alshabib commented 11 years ago

And it's not all that difficult to implement it really mostly an oversight.

Ali Al-Shabibi

On Jun 13, 2013, at 8:51, Tim Upthegrove notifications@github.com wrote:

Any word on this? Any thoughts on how hard this would be to fix? Would it be possible to put it into a 1.0 release?

— Reply to this email directly or view it on GitHubhttps://github.com/OPENNETWORKINGLAB/flowvisor/issues/244#issuecomment-19400901 .

mssumanth commented 11 years ago

Hi Tim,

I tried to reproduce this issue in Mininet, but without any correction given itself, I was getting the correct stats at Floodlight.

I did the following.

1.0.8 FlowVisor is connected to the floodlight controller and connected to mininet which has a topology of two switches connected and each switch in turn is connected to 50 hosts. I did a pingall in this topology such that all the flows get populated on the two switches and when I queried for switch flowstats from floodlight, with the cmd: /wm/core/switch///json REST API cmd, I got all the stats in one shot(like more than 3 or 4 pages).

In your attached packet captures, I did not quite understand where Floodlight had queried FV about the stats and FV returns null.

Thanks Sumanth

gth828r commented 11 years ago

mssumanth wrote:

I did the following.

1.0.8 FlowVisor is connected to the floodlight controller and connected to mininet which has a topology of two > switches connected and each switch in turn is connected to 50 hosts. I did a pingall in this topology such that all the flows get populated on the two switches and when I queried for switch flowstats from floodlight, with the cmd: /wm/core/switch///json REST API cmd, I got all the stats in one shot(like more than 3 or 4 pages).

Ah, interesting. So whats different between what you and I are doing? I guess one difference is that my stats query goes to a switch where there are a lot of flows that don't belong to my slice. Can you try the test again and see if that makes a difference? I think in particular, you should make sure that some of the stats replies that come back in the beginning are outside of your slice's flowspace, and then later ones correspond to flows that are in your slice's flowspace. That is basically what is happening in my test, so I am thinking that might be key.

I don't have any places that I can easily test this, otherwise I'd offer to try a test with all flows on a switch falling within a slice's flowspace myself and see if the behavior is the same as what you are seeing.

mssumanth wrote:

In your attached packet captures, I did not quite understand where Floodlight had queried FV about the stats and FV returns null.

By this, do you mean that it is surprising that there are a bunch of stats replies with no body? I guess one thing I never explained is what is actually going on in the fv_1.0.8-1_stats.pcap file. I sent a query to the GPO lab switch, poblano, which is heavily used by GENI experimenters. I had only pushed down one flow, and I immediately queried for flow stats afterwards. My flow stats show up in the final stats reply message of the packet capture. All of the stats replies before that presumably belong to different slices, and FV cut the bodies of those messages out.

FWIW, poblano is an old NEC IP8800 running the product firmware, but querying that switch for flow stats always had the right behavior up until the point when we upgraded to FV 1.0.8-1, so I am not suspicious of the switch itself.

mssumanth commented 11 years ago

I still am not able to reproduce this problem over here. I tried the following. There are 3 slices which share the same pair of switches. One slice has some 40 hosts in its flowspace(so that I get a lot of stats reply when I query) which is connected to the floodlight controller. Other two slices have just two hosts connected to the two respective switches. One of the other slice is connected to a POX controller. I started ping traffic from the other slice(controlled by POX) and queried for the stats reply from floodlight controller and I get a blank stats reply message which is correct since there is no statistics falling in its slice...Then I started ping traffic from the rest of the other 40 hosts which fall in the flowspace of the slice controlled by floodlight and I get all the statistics message corresponding to the traffic from all the hosts.

Tim, in your comment what I did not understand was how can floodlight get stats from other slices which do not fall in its flowspace.Isn't something wrong there?

gth828r commented 11 years ago

mssumanth wrote:

I get all the statistics message corresponding to the traffic from all the hosts.

I'm not sure what to say... perhaps the issue is a config value in our FV? I'll try to look at that, and I'll try to grab a packet capture at the FV facing the switch along with a snippet of the FV logs tomorrow when I try another flow stats query.

If that doesn't turn anything up, we can try pointing a GENI sliver at a controller that you are running so you can query the same thing that I am querying.

mssumanth wrote:

Tim, in your comment what I did not understand was how can floodlight get stats from other slices which do not fall in its flowspace.Isn't something wrong there?

The only time I see that behavior is in the packet captures from 0.8.17, at which point I don't think FV actually filtered which flow stats got through to a controller. You can see this behavior today in GENI for all of the deployments that are still at FV 0.8.17.

mssumanth commented 11 years ago
``` I'll try to look at that, and I'll try to grab a packet ``` capture at the FV facing the switch along with a snippet of the FV logs tomorrow when I try another flow stats query. ``` If that doesn't turn anything up, we can try pointing a ``` GENI sliver at a controller that you are running so you can query the same thing that I am querying. ``` Ok Let's try it! If a part or the entire topology is shared ``` by more than one slice i.e. say there are two hosts, each connected to a switch belong to two slices i.e. there is no distinction between the slices' flowspace rules at all, then there might be no stats reply sent back by FV to the controller. ``` The only time I see that behavior is in the packet captures ``` from 0.8.17, at which point I don't think FV actually filtered which flow stats got through to a controller. You can see this behavior today in GENI for all of the deployments that are still at FV 0.8.17. If you want the behavior of 0.8.17 where in you should be able to get the entire flowtable of the switch at FlowVisor, there is this new feature in FV1.2 where in you can register for flow table from FV which then returns you the entire flow table of that particular switch regardless of the slice. But it is only supported in JSON! On Tue, Jun 18, 2013 at 8:35 PM, Tim Upthegrove notifications@github.comwrote: > mssumanth wrote: > > I get all the statistics message corresponding to the traffic from all the > hosts. > > I'm not sure what to say... perhaps the issue is a config value in our FV? > I'll try to look at that, and I'll try to grab a packet capture at the FV > facing the switch along with a snippet of the FV logs tomorrow when I try > another flow stats query. > > If that doesn't turn anything up, we can try pointing a GENI sliver at a > controller that you are running so you can query the same thing that I am > querying. > > mssumanth wrote: > > Tim, in your comment what I did not understand was how can floodlight get > stats from other slices which do not fall in its flowspace.Isn't something > wrong there? > > The only time I see that behavior is in the packet captures from 0.8.17, > at which point I don't think FV actually filtered which flow stats got > through to a controller. You can see this behavior today in GENI for all of > the deployments that are still at FV 0.8.17. > > — > Reply to this email directly or view it on GitHubhttps://github.com/OPENNETWORKINGLAB/flowvisor/issues/244#issuecomment-19661329 > .
alshabib commented 11 years ago

So correct me if I am wrong but I figure you are seeing this problem because you are creating a slice which has access to the entire flowspace and generating stats queries from that controller. Am I correct?

If the above is correct then, you would not see any flowspace entries because no flowmod came from that global slice. Which I actually think is the correct behaviour, since another slice should not be able to see that another slice even exists. I can see to solutions to this problem:

  1. As Sumanth said, FV1.2 has an API call that allows you to register for a callback whenever FV receives a new flow stats reply. This is nice because now you don't need to poll and you don't need a controller.
  2. If you define a read-only slice, then a case can be made that a read-only slice should obtain whatever flowtables match its flowspace irrespective of whether it pushed those rules or not.

Thought?

On Jun 18, 2013, at 4:24 PM, mssumanth notifications@github.com wrote:

I still am not able to reproduce this problem over here. I tried the following. There are 3 slices which share the same pair of switches. One slice has some 40 hosts in its flowspace(so that I get a lot of stats reply when I query) which is connected to the floodlight controller. Other two slices have just two hosts connected to the two respective switches. One of the other slice is connected to a POX controller. I started ping traffic from the other slice(controlled by POX) and queried for the stats reply from floodlight controller and I get a blank stats reply message which is correct since there is no statistics falling in its slice...Then I started ping traffic from the rest of the other 40 hosts which fall in the flowspace of the slice controlled by floodlight and I get all the statistics message corresponding to the traffic from all the hosts.

Tim, in your comment what I did not understand was how can floodlight get stats from other slices which do not fall in its flowspace.Isn't something wrong there?

— Reply to this email directly or view it on GitHub.

gth828r commented 11 years ago

alshabib wrote:

So correct me if I am wrong but I figure you are seeing this problem because you are creating a slice which has access to the entire flowspace and generating stats queries from that controller. Am I correct?

Nope, this is just a plain old slice that is constrained to a specific flowspace (in this case, IP/ARP ethertype + a specific subnet). Any experimenter in GENI would have this same issue. For the problem reported in this issue, pretend that I am not the person running the FV, and I don't have access to it, but someone has just assigned some flowspace to me and pointed it at my controller.

The other things mentioned regarding flow stats seem like good things that we should investigate (as operators) once we move to 1.2, and we will do so when we get to that point.

gth828r commented 11 years ago

Ok the data I promised:

FYI for your wireshark decoding, my controller is on port 33012, and the FV is listening for inbound switch connections on port 6633. For the capture taken on the FV, I used the following filter:

(tcp.port == 6633 or tcp.port == 33012) and (of.type == 16 or of.type == 17)

On the controller pcap, I just used:

of.type == 16 or of.type == 17

From these packet captures, it is apparent that the "flags" portion of the stats reply that I mentioned before is getting changed from a value of "1" to a value of "0" somewhere on the FV processing on our lab FV host, even though the transaction ID remains constant.

What information about our setup can I provide to help determine what the difference is between your test setup and our lab FV's setup?

gth828r commented 11 years ago

FYI, here is the config from our FV:

  "flowvisor": [
    {    
      "api_webserver_port": 8080,
      "db_version": 2,
      "host": "localhost",
      "log_ident": "flowvisor",
      "checkpointing": false,
      "listen_port": 6633,
      "logging": "NOTE",
      "run_topology_server": false,
      "log_facility": "LOG_LOCAL7",
      "version": "flowvisor-0.8.13",
      "config_name": "default",
      "api_jetty_webserver_port": 8081,
      "default_flood_perm": "fvadmin",
      "track_flows": false,
      "stats_desc_hack": false
    }    
  ],

I'm not sure if that is relevant or not. If you would like us to point an FV slice here in our lab at a controller that you all are running, please let us know. Note the version of FV reported in this output is 0.8.13, but the version reported by dpkg is 1.0.8-1:

$dpkg -s flowvisor | grep Version
Version: 1.0.8-1
jbsbbn commented 11 years ago

This has been stuck for two weeks now. Is that just because you guys haven't been able to reproduce it? If so, it should be trivial to reproduce in GENI, even if for some reason you can't reproduce it in your own virtual environments. Let us know if you need a hand creating a GENI sliver and doing the same thing Tim's doing.

mssumanth commented 11 years ago

Josh,

We were able to reproduce this using our FlowVisor test framework. (by manually stubbing out the flag to one from our fake switch).

Earlier, the problem was that whenever I tried to reproduce it even with large stats reply, ovs never used to set the flag in mininet simulation and the entire(big) stats response would appear at the floodlight controller in one shot.(with the flag set to zero)

I am now working on its correction. Earlier we had thought of a simple correction, but there needs to be some addition done on top of it. Am analysing that part now, will get in touch with you shortly.

Best regards Sumanth

On Mon, Jun 24, 2013 at 10:48 AM, Josh Smift notifications@github.comwrote:

This has been stuck for two weeks now. Is that just because you guys haven't been able to reproduce it? If so, it should be trivial to reproduce in GENI, even if for some reason you can't reproduce it in your own virtual environments. Let us know if you need a hand creating a GENI sliver and doing the same thing Tim's doing.

— Reply to this email directly or view it on GitHubhttps://github.com/OPENNETWORKINGLAB/flowvisor/issues/244#issuecomment-19923261 .

mssumanth commented 11 years ago

A hashmap - flowStats(which would contain the statsReply from FV to the controller) was being cleared for every stats reply message from the switch. This had to be set right by introducing another hashMap which copies all the information from flowStats and is sent to the controller. flowStats is cleared only when there are no more replies coming from the switch (i.e. the flag is set to zero)

Fixed in the commit: 0be0f477a280abdbf7a079f5cf5d6ec82903d818