makerdao / vdb-mcd-transformers

GNU Affero General Public License v3.0
7 stars 4 forks source link

Generic Time Series Queries #191

Open hexonaut opened 4 years ago

hexonaut commented 4 years ago

Continuing on from here: https://github.com/makerdao/vdb-mcd-transformers/issues/179

I think a common use case for this data is going to be displaying time series data on these values/events.

I think we should provide a common generic api for querying stats with 3 parameters. A start datetime, an end datetime and a granularity bucket size. I've added these three for collateral price, savings dai and collateral locked:

create index if not exists headers_block_timestamp_idx ON public.headers (block_timestamp);

drop function if exists api.spot_poke_time;
drop type if exists api.spot_poke_time_result cascade;
create type api.spot_poke_time_result AS (
    date timestamptz,
    ilk_id int4,
    value numeric,
    block_number int8,
    block_timestamp numeric
);
create function api.spot_poke_time(start_time date, end_time date, granularity interval)
    returns setof api.spot_poke_time_result as $$
        WITH input AS (
            SELECT generate_series(start_time, end_time, granularity) AS d
        ),
        ilk_values AS (
            SELECT input.d AS date, ilk_id, MAX(s.header_id) AS header_id FROM maker.spot_poke s CROSS JOIN input LEFT JOIN public.headers h ON (h.id = s.header_id) WHERE h.block_timestamp < extract(epoch FROM (input.d + granularity)) GROUP BY input.d, ilk_id
        )
        SELECT i.date, i.ilk_id, s.value, h.block_number, h.block_timestamp FROM ilk_values i LEFT JOIN maker.spot_poke s ON (s.header_id = i.header_id AND s.ilk_id = i.ilk_id) LEFT JOIN public.headers h ON (h.id = s.header_id)
    $$ language sql stable;

drop function if exists api.pot_pie_time;
drop type if exists api.pot_pie_time_result cascade;
create type api.pot_pie_time_result AS (
    date timestamptz,
    pie numeric,
    block_number int8,
    block_timestamp numeric
);
create function api.pot_pie_time(start_time date, end_time date, granularity interval)
    returns setof api.pot_pie_time_result as $$
        WITH input AS (
            SELECT generate_series(start_time, end_time, granularity) AS d
        ),
        pie_values AS (
            SELECT input.d AS date, MAX(p.header_id) AS header_id FROM maker.pot_pie p CROSS JOIN input LEFT JOIN public.headers h ON (h.id = p.header_id) WHERE h.block_timestamp < extract(epoch FROM input.d + granularity) GROUP BY input.d
        )
        SELECT i.date, p.pie, h.block_number, h.block_timestamp FROM pie_values i LEFT JOIN maker.pot_pie p ON (p.header_id = i.header_id) LEFT JOIN public.headers h ON (h.id = p.header_id)
    $$ language sql stable;

drop function if exists api.ilk_ink_time;
drop type if exists api.ilk_ink_time_result cascade;
create type api.ilk_ink_time_result AS (
    date timestamptz,
    ilk_id int4,
    ink numeric,
    block_number int8,
    block_timestamp numeric
);
create function api.ilk_ink_time(start_time date, end_time date, granularity interval)
    returns setof api.ilk_ink_time_result as $$
        WITH input AS (
            SELECT generate_series(start_time, end_time, granularity) AS d
        ),
        latest_urns AS (
            SELECT DISTINCT ON (urn_id) ink, header_id, urn_id FROM maker.vat_urn_ink u LEFT JOIN public.headers h ON (h.id = u.header_id) ORDER BY urn_id, block_number DESC
        )
        SELECT input.d AS date, ilk_id, SUM(u.ink) AS ink, MAX(h.block_number) AS block_number, MAX(h.block_timestamp) AS block_timestamp FROM latest_urns u CROSS JOIN input LEFT JOIN maker.urns urns ON (urns.id = u.urn_id) LEFT JOIN public.headers h ON (h.id = u.header_id) WHERE h.block_timestamp < extract(epoch FROM (input.d + granularity)) GROUP BY input.d, ilk_id
    $$ language sql stable;

I'm not the best with Postgres so maybe there is a cleaner way to write those, but I think this pattern can be generalized to produce aggregation over most stats. For single values we can provide the latest value in the bucket (as in the examples above). For event-based stats such as Draws/Wipes we can provide both SUM and AVERAGE aggregators.

Also, does it make sense to have all these definitions in one giant schema.sql file? Maybe it makes sense to break these apart into more manageable pieces?

KentonPrescott commented 4 years ago

Hi Sam. For those who aren't as familiar with SQL but are still curious in what is proposed, could you provide a pseudo-example response from each query?

hexonaut commented 4 years ago

So each of these take the form of <table>_time(start, end granularity). So for example spot_poke_time('2020-05-01'::DATE, '2020-06-01'::DATE, '1 day'::INTERVAL) returns the ilk price over the month of May in 1 day buckets.

Similarly ilk_ink_time('2020-05-01'::DATE, '2020-06-01'::DATE, '1 day'::INTERVAL) returns the amount of collateral an ilk has for the month of May in 1 day buckets.

Here is an example in GraphQL form:

Query:

{
  ilkInkTime(startTime:"2020-05-01", endTime:"2020-06-01", granularity:{days:1}) {
    nodes {
      date,
      ilkId,
      ink,
      blockNumber,
      blockTimestamp
    }
  }
}

Response:

{
  "data": {
    "ilkInkTime": {
      "nodes": [
        {
          "date": "2020-05-28T00:00:00+00:00",
          "ilkId": 10,
          "ink": "14243787879208000000000000",
          "blockNumber": "10070473",
          "blockTimestamp": "1589542052"
        },
        {
          "date": "2020-05-11T00:00:00+00:00",
          "ilkId": 1,
          "ink": "648226674586864668452888",
          "blockNumber": "10047822",
          "blockTimestamp": "1589238672"
        },
        {
          "date": "2020-05-12T00:00:00+00:00",
          "ilkId": 10,
          "ink": "13323780487907000000000000",
          "blockNumber": "10052841",
          "blockTimestamp": "1589306670"
        },
        {
          "date": "2020-05-05T00:00:00+00:00",
          "ilkId": 9,
          "ink": "282783589266214677304525",
          "blockNumber": "10007975",
          "blockTimestamp": "1588705424"
        },
        {
          "date": "2020-05-10T00:00:00+00:00",
          "ilkId": 508622,
          "ink": "57743128310000000000",
          "blockNumber": "10041360",
          "blockTimestamp": "1589152472"
        },
        {
          "date": "2020-05-04T00:00:00+00:00",
          "ilkId": 1,
          "ink": "90936570123725291376599",
          "blockNumber": "10002705",
          "blockTimestamp": "1588634910"
        },
        ...
      ]
    }
  }
}

Does this explain better?

It also probably makes sense to add an optional ilk filter and to order the result by date.

rmulhol commented 4 years ago

Thanks for the input! Will put some thought into considering how we could support these kinds of queries 👍

hexonaut commented 4 years ago

Wrote a PR for ilk_snapshot: https://github.com/makerdao/vdb-mcd-transformers/pull/202

I think this format can be generalized to many of the other trigger/storage tables.

In particular I am prefixing time to indicate it is a bucketed, time-series query and requiring bucket_start, bucket_end and bucket_interval.

hexonaut commented 4 years ago

PR for auctions: https://github.com/makerdao/vdb-mcd-transformers/pull/211