gwu-cs-iot / collaboration

Spring '20 IoT - systems and security class. This is the collaborative half of the class.
https://www2.seas.gwu.edu/~gparmer/classes/2020-01-01-Internet-of-Things-Systems-Security.html
MIT License
14 stars 26 forks source link

Paper Discussion 16a: Hash-routing Schemes For Information Centric Networking #109

Open ratnadeepb opened 4 years ago

ratnadeepb commented 4 years ago

Please provide critique and review for the paper.

I would appreciate if you could also add a note about the parts you thought were rather dense (or if you thought the paper was simple enough).

A video link for the presentation by the author can be found here

Edits

Some common trouble most people had:

  1. What's meant by a regular network?
  2. What is ICN?

@anguyen0204 commented on how ISP bias would increase content popularity skewness. @anguyen0204 and @s-hanna15 had questions regarding security. @mralexjacobson was dubious about what the authors are trying to do here. @s-hanna15 also asked which scheme was preferable. @bushidocodes asked if these schemes would work in presence of encryption or streaming protocols like HTTP2.

anguyen0204 commented 4 years ago

Reviewer: Andrew Nguyen Review Type: Critical

Problem Being Solved

Hash routing methods are not new however this paper revisits this concept to Information-Centric-Networking. The aim is to see if hash routing is efficient outside of enterprise networks but still relevantly within that scope. There are several challenges for the setup of these ubiquitous in-network caching system: cache placement, content placement, & request-to-cache routing challenges.

Main Contributions The authors of the paper summarize that the crux of the research is that "if a content exists within one of the domain’s caches, then it will be in the cache calculated by the hash function." The paper proposes several algorithms and ways that justify the increase in cache hits and a reduction of domain traffic by nearly 31%. Two strategies of implementing this would be On-path content placement with opportunistic request-to-cache routing and Off-path content placement with co-ordinated request-to-cache routing. The paper then delves into the design and schematic modeling of such algorithms and architecture for hash-routing, multicasting, and its setup & analysis.

Questions

  1. The authors of the paper expected that the evaluations would not be fully hash-routed and that they expect service level agreements between content providers and ISPs will result in favorable treatment of a fraction of traffic. How did this affect the results? They did not mention it again but would it have an impact on the percentage results of either the highly skewed vs non skewed traffic links and so forth?
  2. What are the security implications? It's dealing with enterprise networks and the tests were ran on simulations but they did not mention it that much.

Critiques The paper concisely explained its HR concept well. By succinctly outlining each topic and providing the scheme for how its laid out and then evaluating it, I was able to follow along well. Because of this, although it's explained well, it didn't not go too in depth with how the functionality would work on a larger scale. More or less it focused a lot on node to node/router to router and smaller scaled communications and It would have been nice to see it touch upon the bigger picture a bit. Next, like my question, it didn't bring up security that much and I feel like that is important. The evaluation was done in a

mralexjacobson commented 4 years ago

Reviewer: Alex Jacobson Type: Critical

Problem Being Solved:

In order to reduce latency in information centric networking environments, use hash routing to determine content placement and retrieval. The paper is investigating whether or not hash-routing is an efficient and viable caching approach when applied outside enterprise networks, but within the boundaries of a domain.

Main contributions:

There are several challenges that the authors had to solve, including a cache placement challenge, the content placement challenge, and the request to cache routing challenge. To solve these issues, two hybrid hashing schemes are presented with the objective of reducing the path stretch introduced by content packet detouring, by selecting the most appropriate content forwarding strategy based on the location of the source, cache, and receiver nodes. They found that hash routing can reduce inter domain traffic by up to 31% as a result of increased cache hits.

Questions:

What is the difference between an enterprise network and a regular network? What is information centric networking? What is the goal? The paper states that their work can increase cache hits by up to 31% with minimal impact on traffic dynamics of intra-domain links. What is the significance of the author’s work? Will people simply be able to download stuff faster or is there more to it?

Critiques:

One critique I have is that the paper introduces all this vocabulary that I have never heard before, such as Information Centric Networking, but it does not define those terms. This makes it significantly harder to understand what the author’s are talking about without doing outside research. Another critique I have regards their testing. They test using Icarus, which they explain is a simulator based on the Fast Network Simulation Setup. I do not know what any of that is. They do not discuss any limitations of this system that they are introducing. How do I know it is trustworthy? Given that I have never taken a networking class, I found the concepts, vocabulary, and problems/solutions a bit difficult to understand and grasp. Perhaps a bit more background would have been beneficial.

s-hanna15 commented 4 years ago

Reviewer: Sam Hanna Review Type: Critical

Problem Being Solved: This paper talked about the challenges of Information-Centric Networks (ICN) and in-network caching. They found three main challenges: cache placement, content placement, and request to cache routing. They look to use an enterprise network technique, hash-routing, in order to deal with these problems.

Important Areas: This paper focuses on how to make hash-routing work for ICN. In order to do this, they propose two approaches. These two approaches are both hybrids of multi-cast, one of the asymmetric routing and one of the symmetric routing. They found that the technique behavior depends on what it is being applied to, but overall, they were able to get a 31% reduction of traffic with their techniques.

Questions:

  1. In what cases is using their HR hybrid Asymmetric preferable to HR hybrid Symmetric and vice versa?
  2. They say that they were able to get a 31% reduction in traffic, but that it comes with an increase in network load, what is that trade-off realistically?
  3. Does this technique come with any security considerations?

Critiques:

  1. They don’t really explain ICN and the difference between that and an enterprise network, so it let me a bit confused on how this helps and why it is different.
  2. The graphs in the evaluation I found to be confusing, I wasn’t really sure what should be happening so seeing the lines didn’t really help my understanding of if it was working or not.
  3. I don’t really get an idea about how they are implementing their approach and how it is significantly different than before, this could just be due to me having a hard time wrapping my head around all of this, but it seems like it is explained in detail what already exists, or at least I understood that, but not as much what they are creating.
bushidocodes commented 4 years ago

Reviewer: Sean McBride

Review Type: Critical Review

Problem Being Solved:

Assuming a static placement of caches on an existing topology and simple off-path caching using an overlay network composed of equal buckets of content defined by a simple hash algorithm, how do different routing strategies affect the tradeoff between cache hit rates and average link load? Additionally, how do these different techniques compare to established on-path caching strategies.

Main Contributions:

  1. Summarizes existing off-path hash-routing techniques for caching
  2. Defines two new hybrid hash-routing techniques: 2a. Hybrid Symmetric-Multicast, which opportunistically uses the symmetric path if the cache happens to lie on the shortest path and otherwise uses multicast to send the response to both the client and the cache 2b. Hybrid Asymmetric-Multicast, which always responds using the shortest path and opportunistically decides to multicast to the cache location depending on the a tunable parameter (path stretch factor) that is # hops to cache / diameter of networks (longest shortest path between two routers/caches in the domain)

Questions:

  1. The paper says a hash function supports both flat and hierarchical content naming, but the example hashing algorithm (modulo) doesn't seem to use hierarchy in any meaningful way. How would using a hierarchically-aware hashing function impact the routing techniques explored in this paper?
  2. The paper discusses using these techniques with an ISP, but increasingly consumers do not trust their ISPs and use strong encryption of both HTTP and DNS. A good example of this is Firefox by default using an encrypted tunnel to pipe DNS to CloudFlare. Does that sort of development render this sort of approach impossible?
  3. Do these techniques only work for traditional HTTP/1.1 request response cycles? How might this work with HTTP/2 Push and more recent streaming-style interfaces?

Criticisms:

  1. The paper mentioned that all of the off-path caching techniques had outsized impact on links to caches with extremely popular content, and that the impact of this depended on topology. However, the evaluation charts only expressed average link load. I think this merited additional testing and perhaps a breakout by the different simulated topologies.
  2. The Hybrid Symmetric-Multicast technique was described using mathematical notation that sort of obfuscated that all this was doing was forgoing multicast if the cache was on the shortest path. Delta SM is only a boolean (delta_sum > 0 : multicast : symmetric). This felt like dressing up a simple idea in mathematical notation to make it seem more significant.
  3. There was very little discussion around the selection of hashing techniques or how the overlay network formed by the off-path caches actually worked. I feel that the decision to hold this constant should have discussed what was assumed as the constant, why, and cite research justifying this choice.
  4. The term domain was used several times, but this seemingly was stretched to include a pan-European ISP. The use of this sort of terminology should have been defined more precisely in the context of this paper.
ericwendt commented 4 years ago

Reviewer: Eric Wendt Review Type: Skim

Problem With the expansion of IoT devices, the idea of a standing network for them is being explored. The problem with traditional networking hardware is that they are not typically cost-effective and require more power than IoT devices are capable of. LoRaWAN is introduced here as a means of long-distance communication that operates at a low frequency.

Contributions

lrshpak commented 4 years ago

Reviewer: Lily Shpak Review Type: Skim

Problem Being Solved

One challenge that is faced in networking is finding a suitable way to store information, especially when the network is information centric. It is important to find a way to store information that is scalable, efficient and cost effective. One of the main issues is the cache placement challenge, deciding which addresses take precedence over other addresses.

Main Contributions

The authors of this paper decide to test multiple hashing schemes. This way they can find one that works the best for their environment. Different hashing methods will put data in different positions, so it is important they find a scheme that works for them.

ratnadeepb commented 4 years ago
2\. the

Very good questions. Regarding your first question, remember that skewness is a measure of how far the mean of the distribution is from the median or mode in units of the distribution's standard deviation. I think that if ISPs treat some data more favorably (bias) then that increases skewness of content popularity. This typically means that cache hit ratio will increase and link load will decrease. The way to think about it is that there is a certain subsection of all content that is being requested more than others. So a certain subset of the caches and the associated links will be busy but there will be less traffic on other links. On the other hand, if content popularity skewness is low then traffic will be more equally distributed on all links increasing load throughout.

ratnadeepb commented 4 years ago

Reviewer: Alex Jacobson Type: Critical

Problem Being Solved:

In order to reduce latency in information centric networking environments, use hash routing to determine content placement and retrieval. The paper is investigating whether or not hash-routing is an efficient and viable caching approach when applied outside enterprise networks, but within the boundaries of a domain.

Main contributions:

There are several challenges that the authors had to solve, including a cache placement challenge, the content placement challenge, and the request to cache routing challenge. To solve these issues, two hybrid hashing schemes are presented with the objective of reducing the path stretch introduced by content packet detouring, by selecting the most appropriate content forwarding strategy based on the location of the source, cache, and receiver nodes. They found that hash routing can reduce inter domain traffic by up to 31% as a result of increased cache hits.

Questions:

What is the difference between an enterprise network and a regular network? What is information centric networking? What is the goal? The paper states that their work can increase cache hits by up to 31% with minimal impact on traffic dynamics of intra-domain links. What is the significance of the author’s work? Will people simply be able to download stuff faster or is there more to it?

Critiques:

One critique I have is that the paper introduces all this vocabulary that I have never heard before, such as Information Centric Networking, but it does not define those terms. This makes it significantly harder to understand what the author’s are talking about without doing outside research. Another critique I have regards their testing. They test using Icarus, which they explain is a simulator based on the Fast Network Simulation Setup. I do not know what any of that is. They do not discuss any limitations of this system that they are introducing. How do I know it is trustworthy? Given that I have never taken a networking class, I found the concepts, vocabulary, and problems/solutions a bit difficult to understand and grasp. Perhaps a bit more background would have been beneficial.

Regular network in this case means the Internet or some sections of it (as appropriate). ICN is a scheme for redefining networking especially for the Internet. The idea is that we are looking to find data on the Internet and not some physical node. Typically, we don't care about the physical node. So yes, we will be able to download things faster.

ratnadeepb commented 4 years ago

Reviewer: Sam Hanna Review Type: Critical

Problem Being Solved: This paper talked about the challenges of Information-Centric Networks (ICN) and in-network caching. They found three main challenges: cache placement, content placement, and request to cache routing. They look to use an enterprise network technique, hash-routing, in order to deal with these problems.

Important Areas: This paper focuses on how to make hash-routing work for ICN. In order to do this, they propose two approaches. These two approaches are both hybrids of multi-cast, one of the asymmetric routing and one of the symmetric routing. They found that the technique behavior depends on what it is being applied to, but overall, they were able to get a 31% reduction of traffic with their techniques.

Questions:

1. In what cases is using their HR hybrid Asymmetric preferable to HR hybrid Symmetric and vice versa?

2. They say that they were able to get a 31% reduction in traffic, but that it comes with an increase in network load, what is that trade-off realistically?

3. Does this technique come with any security considerations?

Critiques:

1. They don’t really explain ICN and the difference between that and an enterprise network, so it let me a bit confused on how this helps and why it is different.

2. The graphs in the evaluation I found to be confusing, I wasn’t really sure what should be happening so seeing the lines didn’t really help my understanding of if it was working or not.

3. I don’t really get an idea about how they are implementing their approach and how it is significantly different than before, this could just be due to me having a hard time wrapping my head around all of this, but it seems like it is explained in detail what already exists, or at least I understood that, but not as much what they are creating.

The first question is excellent. And I would say, given their performance data, HR Hybrid Asymm does seem like the better scheme as it improves cache hit as much as the symmetric schemes but put much less load on the network. It is a somewhat surprising result given that the hybrid asymmetric scheme can at times not cache data at all.

Regarding the second question, all caching schemes increase link load. But these schemes (especially the hybrid AM) does less so.

ratnadeepb commented 4 years ago

Reviewer: Sean McBride

Review Type: Critical Review

Problem Being Solved:

Assuming a static placement of caches on an existing topology and simple off-path caching using an overlay network composed of equal buckets of content defined by a simple hash algorithm, how do different routing strategies affect the tradeoff between cache hit rates and average link load? Additionally, how do these different techniques compare to established on-path caching strategies.

Main Contributions:

1. Summarizes existing off-path hash-routing techniques for caching

2. Defines two new hybrid hash-routing techniques:
   2a. Hybrid Symmetric-Multicast, which opportunistically uses the symmetric path if the cache happens to lie on the shortest path and otherwise uses multicast to send the response to both the client and the cache
   2b. Hybrid Asymmetric-Multicast, which always responds using the shortest path and opportunistically decides to multicast to the cache location depending on the a tunable parameter (path stretch factor) that is # hops to cache / diameter of networks (longest shortest path between two routers/caches in the domain)

Questions:

1. The paper says a hash function supports both flat and hierarchical content naming, but the example hashing algorithm (modulo) doesn't seem to use hierarchy in any meaningful way. How would using a hierarchically-aware hashing function impact the routing techniques explored in this paper?

2. The paper discusses using these techniques with an ISP, but increasingly consumers do not trust their ISPs and use strong encryption of both HTTP and DNS. A good example of this is Firefox by default using an encrypted tunnel to pipe DNS to CloudFlare. Does that sort of development render this sort of approach impossible?

3. Do these techniques only work for traditional HTTP/1.1 request response cycles? How might this work with HTTP/2 Push and more recent streaming-style interfaces?

Criticisms:

1. The paper mentioned that all of the off-path caching techniques had outsized impact on links to caches with extremely popular content, and that the impact of this depended on topology. However, the evaluation charts only expressed average link load. I think this merited additional testing and perhaps a breakout by the different simulated topologies.

2. The Hybrid Symmetric-Multicast technique was described using mathematical notation that sort of obfuscated that all this was doing was forgoing multicast if the cache was on the shortest path. Delta SM is only a boolean (delta_sum > 0 : multicast : symmetric). This felt like dressing up a simple idea in mathematical notation to make it seem more significant.

3. There was very little discussion around the selection of hashing techniques or how the overlay network formed by the off-path caches actually worked. I feel that the decision to hold this constant should have discussed what was assumed as the constant, why, and cite research justifying this choice.

4. The term domain was used several times, but this seemingly was stretched to include a pan-European ISP. The use of this sort of terminology should have been defined more precisely in the context of this paper.

I don't think they use the hierarchical naming in any substantial manner. They sort of just mention that it will work. As far my understanding goes, once data enters that particular network, it's immaterial where it came from.

This is a very good question. My idea of these schemes are that they basically work similar to HTTP routing. My ISP can't see the exact request but the URL has to be visible. Obviously this goes much deeper and there are other considerations that will render it unable to use such schemes. And of course, you can't use caching for writing data.

Good question and I frankly don't know. I feel the tests are not so deep.

ratnadeepb commented 4 years ago

Reviewer: Eric Wendt Review Type: Skim

Problem With the expansion of IoT devices, the idea of a standing network for them is being explored. The problem with traditional networking hardware is that they are not typically cost-effective and require more power than IoT devices are capable of. LoRaWAN is introduced here as a means of long-distance communication that operates at a low frequency.

Contributions

* Openchrip: this allows multiple users to manage battery-operated transducers across large areas using LoRa radios

* Sigfox: operates at 868MHz which doesn't require a license. These devices can communicate over many kilometers

* Discussion of communication techniques such as REST and an OpenChirp API

Eric, wrong Issue my friend!

RyanFisk2 commented 4 years ago

Reviewer: Ryan Fisk

Review Type: Comprehension

Problem Being Solved

Finding a way to store and quickly access data in an information centric network has been a challenge in networking and particularly in IoT devices. Some of the issues that this paper explores are how to place the in-network caches to allow as many devices as possible to quickly access this information. There is also the challenge of how to spread out data among the cache nodes, and the request to routing challenge, which deals with which nodes resolve content request. This paper looks at multiple hashing solutions for these problems in information centric networks.

Contributions

This paper evaluates 5 different hashing schemes for their cache hits, link load, and performance on multiple topologies. Theses tests showed the benefits and faults of each of these systems. While the cache hit ratio was below optimal on all 5 systems, they did find that some of the caching schemes improved the average linked load.

Questions

1) How many devices were acting as caches during their tests? Would having more of these improve the performance?

2) How much can these cache nodes store at once? It seems like they would need a good number of them for this system to make any difference.

3) How do these caching solutions compare with having edge devices?