Polkadot Runtime Metric Exporter V0.1 Features

kianenigma commented 2 years ago

What we have now is a live exporter that pushes outputs to prometheus. Both of these details shall change: liveness, and the output

About the output, we should build upon the current model, in which every logical unit (i.e. a pallet) is implemented as an IExporter. All of the current exporters should be renamed to PromExporter postfix to demonstrate that they are exporting prometheus data, or even better, they should all inherit from a common class called PromExporter.

The new exporters, all under the umbrella of a new common class called TimescaleExporter, have a new filed called version.

About the live-ness. The core engine should be capable of being told to either:

Scrape the chain's history
Listen to incoming blocks (current behavior)

The final outcome, should be able to handle both of the above. Some programming gymnastics is probably needed to achieve this.

Every TimescaleExporter would make to a table. The version, highest and lowest scraped blocks are stored in the metadata of the table. Once a version change in a TimescaleExporter is detected, the old table can be wiped and replaced by the new one.

kianenigma commented 2 years ago

(this plan has been discussed further offline, this is mainly to have the gist written down somewhere)

gilles437 commented 2 years ago

A few comments:

some additional tables will be added to the timescaledb metrics tables, with the purpose of normalising the database and reduce the data size.
For performance purpose, and in order to load historical data from the block genesis of each parachain, the runtime exporter should work in with a multi-threaded architecture. This will help to achieve reasonable times when loading the full history of a parachain, and reduce the time of loading from weeks to hours. Even if it is done once for one environment, the runtime exporter might be used externally as an open source, and provide acceptable loading times.
As the loading history task can be very long, it is subject to disconnections from the RPC node, which means that loading history should be done iteratively with intermediate checks. This is particularly relevant with multi threading.
versioning is a good idea, more generally, the runtime exporter should be able to run any segment (starting block and ending block) from a json configuration file, even with the same version, and execute it from new, meaning wiping the current data and replace it with a new one for the specified time segment . The runtime exporter should read this configuration periodically, like every 10 minutes, and execute the requested history loading, per parachain. This will allow the process to run continuously, and together add historical data for a specific parachain. This will provide some flexibility in the usage, and make it easy to load any parachain, for any specific period of time.
Example:

{"parachains":[
{"chain":"wss://rpc.polkadot.io", "startingBlock":11330335, "endingBlock": 11265335 } ]}

Running a new version of the runtime-exporter implies to run all the historical data from new for all the parachains, for every TimescaleExporter that changed its version, before reading the configuration file described above (if specified), and for all the stored data of the same TimescaleExporter.

-As prometheus and timescaledb will be supported simultaneously, the code should be structured in a way that the runtime exporter can use only one database, or both of them.

kianenigma commented 2 years ago

Reading the current state of the code, I have some reservations about the direction that it is going in terms of complexity. Let's recap some objectives here: The goal of this project is to a simple, lightweight and plugabble runtime inspector. We want to scrape some data from the runtime, and have an engine that can write it to some arbitrary backend, e.g. Prometheus or Timescale, or any other database in the future.

Looking at the current code around system, I am worried that we are strictly assuming the existence of certain backends, and that's not good. We have artifacts like withProm and withTs, and some classes like System extends CTimeScaleExporter have properties for both Prometheus and Timescale, and some are initialized and some are not. This is all really against good software practices.

Here's one suggestion about how you would improve this.

/// A container for un-opinionated code that fetches data from the system pallet, without caring
/// about which exporter we are using.
class BaseSystem {
    palletIdentifier: any;
    exporterIdentifier: string;
    exporterVersion: number;

    // The return type of this should not be ANY, but simplifying stuff here...
    getData(): any { }
}

/// All of the abstractions needed to deal with prometheus are here.
class SystemPromExporter extends BaseSystem {
    finalizedHead: any;
    blockWeight: any;
    blockLength: any;
    specVersionning: any;
    registry: PromClient.Registry;

    constructor() {
        super()
    }

    doStuff() {
        const data = super.getData();
        // output data to prometheus
    }
}

/// All of the abstractions needed to deal with timescale, such as threading etc are here.
class SystemSqlExporter extends BaseSystem {
    systemFinalized: typeof Sequelize;
    systemWeight: typeof Sequelize;
    systemBLockLenght: typeof Sequelize;
    systemSpecVersion: typeof Sequelize;

    constructor() {
        super()
    }

    doStuff() {
        const data = super.getData();
        // output data to sql
    }
}

// NOTE: both `SystemSqlExporter` and `SystemPromExporter` can extend some other base class such as `CTimeScaleExporter` that help with some shared functionality.

The current main function is also, admittedly, a mix of timescale and prometheus code mixed together and I am similarly worried about the complexity and extensibility of the project as we go forward.

paritytech / polkadot-runtime-prom-exporter

Polkadot Runtime Metric Exporter V0.1 Features #18