textileio / textile

Textile hub services and buckets lib
MIT License
227 stars 45 forks source link

Miner's index & deal calculator #494

Closed jsign closed 3 years ago

jsign commented 3 years ago

This PR introduces a new daemon mindexd responsible for collecting external data (Powergates and Filecoin chain) and building a miner's index. The UI for the Miner's Index is still not defined, which can be understood as the backend for this subsystem.

This PR stands on the shoulders of this Powergate PR. Reviewing doesn't need to understand those changes since I tried to be quite verbose in the comments to be self-contained.

I describe big ideas in different sections, but I give more details in PR comments!


Since this is a new daemon, all this work is independent of other daemons hubd, userd, etc. The idea of this daemon is not to have a publicly accessible API, so whenever the UI is defined, we need to route some needed APIs in some publicly facing daemon to call APIs of this one.

The miner's index is a mixture of:

The main idea is to gather all this data, and build a useful miner index with it.


All the Powergate related data about importing records, is done in two layers.

The first layer, is a collector component which asks registered Powergate instances to provide all created/modified storage/retrieval records that have happened since time X. This delta-style importing allows to poll in an efficient way with a small bandwidth cost.

All imported records are merged in a single collection, thus having a complete view of all deals and retrievals made by multiple Powergates. This will be the source layer from which construct more meaningful metrics for the index. Also, since we have always this raw records, we can keep experimenting creating multiple different metrics.

In summary, the collector maintains an up-to-date collection of all the deals and retrieval information from multiple Powergates. To quickly see what these records store, see this model.

On top of the previous one, the second layer is the indexer component, which will build more meaningful metrics. For example:

All this system allows importing Powergate external information and process it to generate Textile related metrics.

It is worth noting that every imported data from a Powergate instance is tagged with a Region field. Metrics can be created per region since it wouldn't be fair to mix metrics from Powergate in the USA and Powergate in China, with some miners in China. Most probably, the Chinese miner will have better metrics for Chinese Powergates.


Regarding on-chain metrics, such as miner's:

The indexer leverages the Powergate stack from the Hub and the newly created Indices APIs to fetch this information from Powergates indices.

The full model (with on-chain and Textile data) can be found here. This is the result that indexer is keeping up to date.


With a specified (tunable) frequency, the indexer will recalculate values for the index to consider new data from on-chain Powergate indices or newly imported records from external Powergates.

It also makes a daily (could be tunable) snapshot of the Index data in a separate history collection to avoid bloating the current index collection. (Thus, possibly affecting queries and indexes). Having a daily snapshot of the index values could allow building history/plotting/rate metrics. For example, we could consider plotting a miner's storage price with time. If we didn't have this history information, we only would have the latest known price. (Current index state).

Maybe it's a stretch, and we would never use this information. If that's the case, this feature could be turned off without problems.


This PR also includes an API to provide a basic useful calculator to make deals with miners.

Easily explained by the gRPC definition:

// rpc CalculateDealPrice
message CalculateDealPriceRequest {
        string miner_address = 1;
        int64 data_size_bytes = 2;
        int64 duration_days = 3;
}

message CalculateDealPriceResponse {
        uint64 total_cost = 1;
        uint64 verified_total_cost = 2;
        uint64 padded_size = 3;
        uint64 duration_epochs = 4;
}

rpc CalculateDealPrice(CalculateDealPriceRequest) returns (CalculateDealPriceResponse) {}

So we're trying to translate real-user domain information into Filecoin values. These calculated values can be presented to the user nicely or leveraged to auto-create CLI commands for Lotus with the correct values (usually a hairy thing to do manually).

Ideally, I'd like to include potential real-time gas fee costs, but I couldn't fit that work on this PR. But that would be a nice addition.