Open b5 opened 5 years ago
@b5 the third point under the Related Github Issues
heading is just a link to github home page.
useful starter for network topology creation: https://laraget.com/blog/ovs-mesh-script-generator
An extremely useful extension of mininet: https://github.com/oliviertilmans/ipmininet which adds routers, ipv6 support and some other goodies.
And some example use cases here and accompanying blog posts here and here
These are notes from a session at libp2p developers meetings in Berlin, 12-07-2018.
Designing distributed algorithms is hard.
Simulation Driven Development
P2P Simulation for Problem Solving
@kubuxu - KAD DHT gives us problems, but before that we had interesting
DHT is probably the most complex system in IPFS when it comes to network topology & interaction. In most other parts of the system interactions are 1-to-1 whereas DHT interaction is whole-network
Not too long ago we discovered that for a very long time we were shipping a beta DHT to peers, causing them
Current problem is 40-60% can't be dialed into, so when the DHT is trying to dial to someone because they are close in the DHT space, the requests wweere timing out
Metrics & Request Tracing
OpenCensus
basically opentracing + stats able to include opt-in metrics in opentracing spans. @lanzafame has adapted the libp2p/rpc library to include method calls and arbitrary stats in cluster opencensus separates the creation of the metric with aggregation. Record creation isn't aggregated in-band
A note on tracing & performance
it's super expensive. when cluster is running with full verbose tracing on, it can't even get anything out. There's a difference between doing simulation to see emergent behaviours vs. debugging a specific issue. visualizations are better for seeing emergent behaviour.
Steps for looking for emergent behaviour:
Logging is a primitive form of visualization
Visualization
Hive Plots
the idea is to take advantage of a "grid" to create an easier-to reason-about plots that are comparable. When networks are plotted in arbitrary space it's difficult to understand when two networks share the same topology
simulation vs production testing
network topologies & algorithm interaction
Simulation should help you dial out We should be able to Iterate through many network topologies, "fuzz" The idea that a node can only dial out isn't something that can/is being taken into consideration when designing an algorithm Specifying failure conditions. Using TDD as an example, it states we should build a simulator before we even write any networking code. adding in bad actors
iterating on performance
In regard to simulation, before you simulate, you'd like to know if the algorithm is working in a "best-case" context. It's hard to know how optimizations you're aiming to implement will affect the failure conditions of the algorithmn in practice
"Simulation driven Design"
expansion on IPTB
Two use cases:
Network Topology Specification
Using the OpenConfig YANG tooling we should be able to specify network configurations and get an AST out that we then use to construct the simulation network, it would require the addition of 'testing' aspects to YANG, i.e. add random variation to the interface speed.
Starting conditions
Flux conditions(over ? network)
variant peer properties
YANG example
Tests & Actions
These should be arbitrary functions
Failure Conditions
To turn
After some thought we came to the conclusion that it's
Two Types of failure conditions:
1. A state in which we know a failure has occurred
"If Peer A ever has x, we know that we've failed" "If Peer B ever Has No connections, we've failed" "If Peers A & B ever have no route to each other, we've failed"
This hints at the ability to attach state-checks to individual nodes would be helpful here. You need to specify behaviour of the nodes
Pros
Cons
2. A measurement-based failure occurs
"If this takes more than 5 minutes, we've failed" "If there are more than 200 connections across the network, we've failed" "No Activity has happened for x amount of time" "If each node on the network has"
Pros
Cons
Being able to inspect the state of the global network is useful here. We can measure global state by having all nodes in the simluation report to a central location.
Specifying Failure conditions
Assigning tests to nodes you're trying to ask a question of the network Frequently you're trying to initiate an action on a node, and evaulate the effect it has on a network
Time
most important thing to think about here is mesurement metrics that are time bound should be configurable
Related Github Issues:
Papers
Existing Tooling & Useful Links
Physical Networks with Testing Support
Initial Session Abstract:
A discussion of best practices for presenting & iterating on networking solutions, with the goal of refining a document that attempts to distinguish, enumerate, and label different approaches to p2p network simulation, outlining strengths, weakness, and examples.
Point of context for this discussion may include: