+OPTIONS: toc:nil

+begin_abstract

+end_abstract

[[https://zenodo.org/badge/DOI/10.5281/zenodo.4088530.svg]]

Lightning Network Gossip

Payments in the Lightning network are source-routed, meaning that the sender of a payment is responsible for finding a route from itself to the payment recipient. This is necessary due to the use of onion routing, based on the Sphinx construction [sphinx2009], in which the data to be transferred, i.e., the payment, is sent with an associated routing packet that specifies the route the data should be transferred over. In Lightning each hop on a route must correspond to a channel that is used to forward the payment, in the form of an HTLC, along with the routing onion.

In order to enable nodes to compute a route to the payment recipient, the nodes exchange information about the topology of the network, with edges corresponding to the channels, and vertices corresponding to the nodes in the network. The exchange of information is specified in the gossip protocol [gossip-spec], and is based on the channel endpoints broadcasting three types of messages to the network:

~channel_announcement~: signed by both endpoints and unique for each channel. It notifies nodes about the existence of a channel, where the funding transaction is in the blockchain and features that are supported on the channel.
~channel_update~: sent by one of the endpoints, and is specific for the outgoing direction from that endpoint. It specifies the parameters to use when traversing the channel in the specified direction. Notice that each channel therefore has two channel directions, which are handled independently. These half-channels can be updated by sending a new message with a higher timestamp field.
~node_announcement~: sent by a node that has at least one active channel, and specifies the metadata of the node. In particular it signals what protocol extensions the node can understand. Similar a node can send multiple announcements, that are then disambiguated through the timestamp field.

Data Collection The data collection is based on a number of c-lightning nodes, that synchronize their view of the topology with peers by exchanging the gossip messages. Internally c-lightning will deduplicate messages, discard outdated ~node_announcements~ and ~channel_updates~, and then apply them to the internal view. In order to persist the view across restarts, the node also writes the raw messages, along some internal messages, to a file called the ~gossip_store~. The node compacts the ~gossip_store~ file from time to time in order to limit its growth. Compaction consists of rewriting the file, skipping messages that have been superceded in the meantime.

We have built a number of tools that allow the tracking of the ~gossip_store~ file, and persisting the messages in order to retain them even after compaction. From the raw messages it is then possible to generate number of derivative formats, that allow inspecting the state of the network at any point during the runtime of the collection.

** File Format In order to minimize the size of the datasets a simple custom file format. The file format consists of a header and a stream of raw gossip messages as they were exchanged over the wire. The header consists of a 3-byte prefix with the value ~GSP~ followed by a single byte version. Currently only version ~0x01~ is defined.

Each message in the raw message stream is prefixed by its length, encoded as [[https://btcinformation.org/en/developer-reference#compactsize-unsigned-integers][~CompactSize~]] integer.

The following code snippet is based on the [[https://pypi.org/project/pyln-proto/][pyln-proto]] Python Package and can be used to load iterate through the messages in a BZ2 compressed dataset:

+begin_src python

from pyln.proto.primitives import varint_decode import bz2

def read_dataset(filename: str): with bz2.open(filename, 'rb') as f: header = f.read(4) print(header[3]) assert(header[:3] == b'GSP' and header[3] == 1) while True: length = varint_decode(f) msg = f.read(length) if len(msg) != length: raise ValueError(f"Incomplete message read from {filename}")

        yield msg

+end_src

For details on the gossip messages themselves please refer to the [[https://github.com/lightningnetwork/lightning-rfc/blob/master/07-routing-gossip.md][Lightning Network Specification]].

Datasets

** Available Datasets The following table lists all available datasets and information about each dataset.

|-------------------------+------------------------------------------------------------------+------------| | Link / Filename | SHA256 Checksum | Messages | |-------------------------+------------------------------------------------------------------+------------| | [[https://storage.googleapis.com/lnresearch/gossip-20201014.gsp.bz2][gossip-20201014.gsp.bz2]] | 8c507298d2d2e7f5577ae9484986fc05630ef0bd2b59da39a60b674fd743713c | | | [[https://storage.googleapis.com/lnresearch/gossip-20201102.gsp.bz2][gossip-20201102.gsp.bz2]] | e6628e77907406288f476d5c86f02fb310474c430eb980e0232a520c98d390aa | | | [[https://storage.googleapis.com/lnresearch/gossip-20201203.gsp.bz2][gossip-20201203.gsp.bz2]] | fa323aae6b1c4d3d659abab8ec42cbbe81dded2ed7b3c526d3bf85f03d7b93cc | | | [[https://storage.googleapis.com/lnresearch/gossip-20210104.gsp.bz2][gossip-20210104.gsp.bz2]] | 992199372dfb5cb1fa5e305c5ef4f2604f591798d522fc0576dc8de32315c79b | | | [[https://storage.googleapis.com/lnresearch/gossip-20210908.gsp.bz2][gossip-20210908.gsp.bz2]] | 0ba0b31c12c4aec7f1255866acef485e239d54dedde99f4905cf869ec57804c1 | | | [[https://storage.googleapis.com/lnresearch/gossip-20220823.gsp.bz2][gossip-20220823.gsp.bz2]] | cb260b0d7d3633db3b267256e43b974d1ecbcd403ab559a80f5e80744578777d | | | [[https://storage.googleapis.com/lnresearch/gossip-20230924.gsp.bz2][gossip-20230924.gsp.bz2]] | b6298fea4dd468e9f6857ab844993363143515b18f9e8c8278f33c601c058e78 | 35'984'848 | |-------------------------+------------------------------------------------------------------+------------|

** Data Coverage

We strive to provide the best possible datasets to researchers. The gossip mechanism in Lightning is however purposefully lossy:

Old gossip messages are not retained by nodes, since they are likely out of date or have been superceded by a newer message, and no longer useful for the operation of the node.
A [[https://github.com/lightningnetwork/lightning-rfc/blob/master/07-routing-gossip.md#rationale-8][staggered broadcast]] mechanism is used to limit the reach of redundant messages, both to protect the nodes from disclosing too much fine-grained information about themselves, and to protect the network from spam.
Messages may not be forwarded to each node in the network, for example if a subset of nodes deems the message invalid.

The first point is likely the most important, since it gives us a unique vantage point, having collected this information from the very beginning of the mainnet deployment. However, initially the collection was rather coarse-grained and some information may have been missed.

While collecting the gossip information we have changed format and methods a number of times, resulting in datasets that do not share the same format and coverage. Our current methodology ensures that we capture the information in its raw state, after applying only the deduplication filtering that c-lightning performs to protect against outdated data and spam from peers.

For collected information that predates the current collection methodology we are still working on updating and annotating it in order to backfill the datasets. This should provide us with the most complete picture of the evolution of the Lightning network ever collected.

Our formats and methodologies changed in the following ways:

/Early 2018 - April 2018/: a cronjob runs ~lightning-cli listchannels~ and stores the resulting JSON object on disk.
/April 2018 - August 2019/: a cronjob calls ~lightning-cli listchannels~ and processes the results. For each channel and state a timespan is generated during which the channel remained stable (no state change). Results matching the last previous timespan are extended, changes to the channel state result in a new timespan being created.
/August 2019 -- now/: the raw protocol messages are extracted from the c-lightning ~gossip_store~ file, deduplicated and added to the database.

Sadly it is unlikely that the high-fidelity format can be recovered completely from the earlier formats, e.g., signatures cannot be recovered from the stored information. However it might be possible to recreate parts of the structural information from the JSON dumps and the timespans. We will eventually make this data public as well, as soon as we have confirmed it is sufficiently free of errors.

The data collection is on a best-effort basis and we don't provide any guarantees that the datasets are complete. We are happy to accept missing gossip messages to backfill the datasets. If you have found missing gossip messages please open an issue or a PR on this repository.

Citing a Dataset in your Publication

If you found these datasets useful or would like others to reproduce your research starting from the same dataset, please use the below BibTeX entry to reference this project, or a specific dataset:

+begin_src bibtex

@misc{lngossip, title = {Lightning Network Research \mdash; Topology Datasets}, author = {Decker, Christian}, howpublished = {\url{https://github.com/lnresearch/topology}}, note = {Accessed: 2020-10-01}, doi = {10.5281/zenodo.4088530} }

+end_src

In case you'd like to reference a specific dataset, please add the URL-fragment ~#dataset-2020-10-01~ to the ~howpublished~ URL. This will ensure that visitors jump in to the above table, allowing them to directly download the dataset.

Publications based on these Datasets
- Lin, Jian-Hong et al., /Lightning network: a second path towards centralisation of the Bitcoin economy/, arXiv preprint arXiv:2002.02819 (2020). [[https://arxiv.org/pdf/2002.02819.pdf][PDF]]
- Zabka, Philipp, et al., /Node Classification and Geographical Analysis of the Lightning Cryptocurrency Network/, 22nd International Conference on Distributed Computing and Networking (ICDCN), Nara, Japan, January 2021. [[https://www.univie.ac.at/ct/stefan/icdcn21ln.pdf][PDF]]
- Pietrzak, Krzysztof et al., /LightPIR: Privacy-Preserving Route Discovery for Payment Channel Networks/, arXiv [[https://arxiv.org/pdf/2104.04293][PDF]]
Bibliography
- [sphinx2009]: Danezis, George & Goldberg, Ian. (2009)., Sphinx: A Compact and Provably Secure Mix Format., IACR Cryptology ePrint Archive. 2008. 269-282., 10.1109/SP.2009.15.

lnresearch / topology

readme

+OPTIONS: toc:nil

+begin_abstract

+end_abstract

+begin_src python

+end_src

+begin_src bibtex

+end_src