lnresearch / topology

Data about the past and current structure of the Lightning Network
MIT License
83 stars 15 forks source link

+OPTIONS: toc:nil

+begin_abstract

+end_abstract

[[https://zenodo.org/badge/DOI/10.5281/zenodo.4088530.svg]]

Payments in the Lightning network are source-routed, meaning that the sender of a payment is responsible for finding a route from itself to the payment recipient. This is necessary due to the use of onion routing, based on the Sphinx construction [sphinx2009], in which the data to be transferred, i.e., the payment, is sent with an associated routing packet that specifies the route the data should be transferred over. In Lightning each hop on a route must correspond to a channel that is used to forward the payment, in the form of an HTLC, along with the routing onion.

In order to enable nodes to compute a route to the payment recipient, the nodes exchange information about the topology of the network, with edges corresponding to the channels, and vertices corresponding to the nodes in the network. The exchange of information is specified in the gossip protocol [gossip-spec], and is based on the channel endpoints broadcasting three types of messages to the network:

We have built a number of tools that allow the tracking of the ~gossip_store~ file, and persisting the messages in order to retain them even after compaction. From the raw messages it is then possible to generate number of derivative formats, that allow inspecting the state of the network at any point during the runtime of the collection.

** File Format In order to minimize the size of the datasets a simple custom file format. The file format consists of a header and a stream of raw gossip messages as they were exchanged over the wire. The header consists of a 3-byte prefix with the value ~GSP~ followed by a single byte version. Currently only version ~0x01~ is defined.

Each message in the raw message stream is prefixed by its length, encoded as [[https://btcinformation.org/en/developer-reference#compactsize-unsigned-integers][~CompactSize~]] integer.

The following code snippet is based on the [[https://pypi.org/project/pyln-proto/][pyln-proto]] Python Package and can be used to load iterate through the messages in a BZ2 compressed dataset:

+begin_src python

from pyln.proto.primitives import varint_decode import bz2

def read_dataset(filename: str): with bz2.open(filename, 'rb') as f: header = f.read(4) print(header[3]) assert(header[:3] == b'GSP' and header[3] == 1) while True: length = varint_decode(f) msg = f.read(length) if len(msg) != length: raise ValueError(f"Incomplete message read from {filename}")

        yield msg

+end_src

For details on the gossip messages themselves please refer to the [[https://github.com/lightningnetwork/lightning-rfc/blob/master/07-routing-gossip.md][Lightning Network Specification]].

** Available Datasets The following table lists all available datasets and information about each dataset.

|-------------------------+------------------------------------------------------------------+------------| | Link / Filename | SHA256 Checksum | Messages | |-------------------------+------------------------------------------------------------------+------------| | [[https://storage.googleapis.com/lnresearch/gossip-20201014.gsp.bz2][gossip-20201014.gsp.bz2]] | 8c507298d2d2e7f5577ae9484986fc05630ef0bd2b59da39a60b674fd743713c | | | [[https://storage.googleapis.com/lnresearch/gossip-20201102.gsp.bz2][gossip-20201102.gsp.bz2]] | e6628e77907406288f476d5c86f02fb310474c430eb980e0232a520c98d390aa | | | [[https://storage.googleapis.com/lnresearch/gossip-20201203.gsp.bz2][gossip-20201203.gsp.bz2]] | fa323aae6b1c4d3d659abab8ec42cbbe81dded2ed7b3c526d3bf85f03d7b93cc | | | [[https://storage.googleapis.com/lnresearch/gossip-20210104.gsp.bz2][gossip-20210104.gsp.bz2]] | 992199372dfb5cb1fa5e305c5ef4f2604f591798d522fc0576dc8de32315c79b | | | [[https://storage.googleapis.com/lnresearch/gossip-20210908.gsp.bz2][gossip-20210908.gsp.bz2]] | 0ba0b31c12c4aec7f1255866acef485e239d54dedde99f4905cf869ec57804c1 | | | [[https://storage.googleapis.com/lnresearch/gossip-20220823.gsp.bz2][gossip-20220823.gsp.bz2]] | cb260b0d7d3633db3b267256e43b974d1ecbcd403ab559a80f5e80744578777d | | | [[https://storage.googleapis.com/lnresearch/gossip-20230924.gsp.bz2][gossip-20230924.gsp.bz2]] | b6298fea4dd468e9f6857ab844993363143515b18f9e8c8278f33c601c058e78 | 35'984'848 | |-------------------------+------------------------------------------------------------------+------------|

** Data Coverage

We strive to provide the best possible datasets to researchers. The gossip mechanism in Lightning is however purposefully lossy:

The first point is likely the most important, since it gives us a unique vantage point, having collected this information from the very beginning of the mainnet deployment. However, initially the collection was rather coarse-grained and some information may have been missed.

While collecting the gossip information we have changed format and methods a number of times, resulting in datasets that do not share the same format and coverage. Our current methodology ensures that we capture the information in its raw state, after applying only the deduplication filtering that c-lightning performs to protect against outdated data and spam from peers.

For collected information that predates the current collection methodology we are still working on updating and annotating it in order to backfill the datasets. This should provide us with the most complete picture of the evolution of the Lightning network ever collected.

Our formats and methodologies changed in the following ways:

Sadly it is unlikely that the high-fidelity format can be recovered completely from the earlier formats, e.g., signatures cannot be recovered from the stored information. However it might be possible to recreate parts of the structural information from the JSON dumps and the timespans. We will eventually make this data public as well, as soon as we have confirmed it is sufficiently free of errors.

The data collection is on a best-effort basis and we don't provide any guarantees that the datasets are complete. We are happy to accept missing gossip messages to backfill the datasets. If you have found missing gossip messages please open an issue or a PR on this repository.

If you found these datasets useful or would like others to reproduce your research starting from the same dataset, please use the below BibTeX entry to reference this project, or a specific dataset:

+begin_src bibtex

@misc{lngossip, title = {Lightning Network Research \mdash; Topology Datasets}, author = {Decker, Christian}, howpublished = {\url{https://github.com/lnresearch/topology}}, note = {Accessed: 2020-10-01}, doi = {10.5281/zenodo.4088530} }

+end_src

In case you'd like to reference a specific dataset, please add the URL-fragment ~#dataset-2020-10-01~ to the ~howpublished~ URL. This will ensure that visitors jump in to the above table, allowing them to directly download the dataset.