onflow / flow-go

A fast, secure, and developer-friendly blockchain built to support the next generation of games, apps, and the digital assets that power them.
GNU Affero General Public License v3.0
534 stars 179 forks source link

[Networking] QUIC transport layer in the libp2p framework #4280

Closed franklywatson closed 3 months ago

franklywatson commented 1 year ago

Context

The network performance is crucial for maintaining efficient communication between nodes in the libp2p framework. As a result, we are continuously exploring methods to improve the speed and reliability of the network.

Problem Definition

The current transport layer in the libp2p framework may not provide the optimal performance necessary for the network to function at its fastest potential.

Proposed Solution

The QUIC transport layer is an available component that can be integrated into the libp2p framework, potentially improving network performance. We propose evaluating the QUIC transport layer to determine its suitability for our use case.

For more background information and a QUIC implementation for libp2p, refer to the following resources:

QUIC in libp2p - YouTube go-libp2p-quic-transport - GitHub

Benefits

Evaluating and potentially integrating the QUIC transport layer into the libp2p framework can provide the following benefits:

  1. Improved network performance, resulting in faster communication between nodes.
  2. Enhanced reliability and connection stability.
  3. Reduced latency and resource consumption.

Next Steps

  1. Research the QUIC transport layer and its compatibility with the libp2p framework.
  2. Implement the QUIC transport layer within a testing environment to evaluate its performance.
  3. Benchmark and profile the QUIC transport layer and compare its performance against the current transport layer.
  4. Analyze the benchmarking results and make an informed decision on whether to integrate the QUIC transport layer into the libp2p framework.
  5. If the QUIC transport layer is deemed suitable, integrate it into the libp2p node and update the documentation to reflect the changes made in the codebase.
  6. Ensure that all tests pass and the code adheres to the Flow coding standards.
  7. Review and finalize the integration of the QUIC transport layer into the libp2p framework.

Definition of Done

  1. The QUIC transport layer has been evaluated for suitability within the libp2p framework.
  2. The performance of the QUIC transport layer has been compared against the current transport layer through benchmarking and profiling.
  3. The code should be appropriately documented and conform to the Flow coding standards.
  4. All tests related to the transport layer should pass.
  5. The evaluation should not break any existing functionality or use cases of the libp2p node.
  6. QUIC should be added as an alternative transport layer without replacing the current ones. The libp2p nodes should be able to negotiate and agree upon a mutually available transport layer to communicate. The changes should not cause backward compatibility issues.
yhassanzadeh13 commented 1 year ago

Note: It is imperative that QUIC is supported as one of the options in the spectrum of streaming protocols available for communication between nodes. The system should facilitate an intelligent negotiation process, wherein the nodes efficiently exchange information regarding their respective sets of supported protocols. Following this exchange, they must collaboratively establish a single protocol that is mutually agreeable and optimally fulfills the requirements of the communication channel. This dynamic selection process will ensure flexibility and adaptability in accommodating varying networking environments and performance considerations. We have already equipped the nodes withe the capability to try examine each others available streaming protocols and find the mutually agreeable one, though it can be also optimized further: https://github.com/onflow/flow-go/tree/master/network/p2p/unicast

AndriiDiachuk commented 1 year ago

Here is the results of tests and benchmarks: https://instinctive-hellebore-ee8.notion.site/Networking-QUIC-transport-layer-in-the-libp2p-framework-457426adf20b451ba76411f1e2adcef0.

The results showed us, that there are some differences between QUIC and TCP in terms of time, performance, or number of tx in make load scenario. But as said in the documentation: Whenever possible, QUIC should be preferred over TCP. Not only is it faster, it also increases the chances of a successful holepunch in case of firewalls.

The key question would be whether the aforementioned results provide a sufficient basis for integrating QUIC as an alternative transport layer. If the decision leans towards implementation, I would like to hear your vision on this. As I understand, we can integrate QUIC to be set as a parameter when configuring the transport layer to make it work with it, and leave the ability to receive both options. Lmk if that’s the idea, would be glad to discuss it.

yhassanzadeh13 commented 1 year ago

@AndriiDiachuk, thanks for sharing the benchmarking results. A few comments:

  1. The Google Drive files you've shared are inaccessible to me, and likely to other members of the Flow team as well. Kindly upload the raw file directly to Notion.
  2. In your report, please clearly indicate the GitHub branch where your benchmarking code is located, along with the steps required to reproduce your results. For the sake of transparency and verification, it's essential that the report includes comprehensive steps to allow for effortless replication of the results.
AndriiDiachuk commented 1 year ago

@yhassanzadeh13, thanks for comments. Changed a bit notion doc with raw files inside and steps to reproduce results. Waiting now for checking from your side for further actions. Thanks in advance.

yhassanzadeh13 commented 1 year ago

@AndriiDiachuk thank you for your efforts and the updates made to the Notion document. While it showcases preliminary insight into the comparative dynamics between QUIC and TCP, I find the analysis to remain on a more superficial level. I was hoping for a comprehensive and systematic analysis that provides aggregated summaries and details the quantitative advantages of one over the other, for example, one that is specified in percentage gains.

The data from the execution nodes on QUIC seems somewhat limited; having logged less than 100 entries doesn't offer a robust foundation for making a decisive judgment. Therefore, while there's indicative potential in the initial findings regarding QUIC, I am reluctant to label the results as thoroughly reliable or conclusive at this stage.

Given your current timeline constraints with the grants, I am not insisting on an extended research phase at your end. We envisage taking this forward through detailed internal research once you done integrating QUIC integration.

Moving forward, I would like you to commence work on item 6, which delineates the introduction of QUIC as a compatible transport layer, facilitating a harmonious negotiation process between libp2p nodes without causing any backward compatibility issues. The imperative here is a structured implementation that ensures a node-to-node agreement on the protocol choice grounded on mutual preference, with TCP as the fallback option.

Your focus should be aligned with ensuring:

  1. TCP (also known as the default unicast protocol in our codebase) is the chosen protocol if there's a preference mismatch, as it stands as the default (see a similar test)
  2. A mutual agreement is reached if both nodes have a common preference for either TCP or QUIC (see a similar test).

The implementation should leverage the functionality of the unicast manager to streamline this process effectively. It would entail setting up a system where nodes can register their protocol preferences, initiating a negotiation from the most preferred to the least preferred (default being TCP), and finalizing the first mutually agreeable protocol.

For a comprehensive validation of the implementation, please include tests covering the two scenarios mentioned earlier, supplemented with an integration test. For integrating any new unicast protocol such as QUIC, in addition to the two scenarios above, we need a third integration test similar to this that evaluates that two nodes automatically settle on a preferred protocol for communicating together, while are falling back to the default protocol for the rest of the network. This integration test runs a Flow network where verification nodes and execution nodes communicate over TCP+Gzip with each other, while they communicate over TCP with the rest of the network. The test assesses that the network continues to work in a healthy manner with no breaking change.

Below you may find my answers to your questions asked over discord:

  1. Should this flag configure just sourceMultiAddr inside the defaultLibP2POptions function while creating both transports?

Yes, if a node chooses to have QUIC, it should have two transports, the current default TCP one, as well as the QUIC, and both should be created at the time of building the LibP2PNode. However, as QUIC is not deemed a default transport for now, it should not be directly added to defaultLibP2POptions. Rather it should be appended to the options if the node is configured to have QUIC.

  1. The second question is should we configure transports via flag also for the client side?

Yes, by default, all nodes have TCP as their preferred unicast protocol. We have other unicast protocols such as TCP + GZIP, which if a node preferred to communicate over, it sets it here as a config parameter. Once QUIC is integrated, it should also be configurable here. The order in which the unicast protocols are set determines the preference. For example, if a node sets this parameter as [quic, gzip] it means that the node prefers to communicate with other nodes over quic first. If the remote node does not support quic then the node tries tcp + gzip, and if it also fails, the node falls back on tcp as the default one.

On a conclusion:

I appreciate your understanding and adherence to the meticulous details that this task involves, underscoring the importance of precise implementation to facilitate a seamless integration process. Looking forward to seeing this implemented with the right balance of efficiency and accuracy.

Guitarheroua commented 1 year ago

Investigation in Google Doc

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.