graphile / pg-aggregates

Aggregates for PostGraphile connections
83 stars 17 forks source link

TIMESTAMP_TRUNCATED_TO_MONTH & TIMESTAMP_TRUNCATED_TO_YEAR #20

Closed jeremybradbury closed 3 years ago

jeremybradbury commented 3 years ago

Feature description

Add more truncated timestamp group by options: TIMESTAMP_TRUNCATED_TO_WEEK TIMESTAMP_TRUNCATED_TO_MONTH TIMESTAMP_TRUNCATED_TO_YEAR

Motivating example

Clearly missing from the list: TIMESTAMP_TRUNCATED_TO_HOUR TIMESTAMP_TRUNCATED_TO_DAY

This is required for my reporting project. Until this is in, I'll have to manually build queries for these.

This is an incredible tool to use for reporting services on an isolated reporting replica... it only makes sense to complete this list so that we can group our data in all the meaningful ways, using a terse built-in method that already exists.

Breaking changes

I don't see any reason this would cause breaking changes, simply extending current features.

Supporting development

I [tick all that apply]:

jeremybradbury commented 3 years ago

Here is an example of truncating weekly for a chart query:

-- weeks
SELECT COUNT(id) AS races, 
     TIMESTAMP as start,
    TRUNCATE(UNIX_TIMESTAMP(TIMESTAMP) / (7 * 24 * 60 * 60), 0) AS week_group
    FROM racing_log 
WHERE TIMESTAMP < NOW() - INTERVAL 0 YEAR 
AND TIMESTAMP > NOW() - INTERVAL 1 YEAR 
GROUP BY week_group 
ORDER BY start ASC
;

monthly looks like: 364.25 = 12 months = 30.35415 or 364.25 = 28.019234 * 13

benjie commented 3 years ago

We demonstrate in the README how we added "truncated-to-hour": https://github.com/graphile/pg-aggregates#defining-your-own-grouping-derivatives and adding truncated to week/month/year is similar.

Though now I come to look at it it assumes you already know the plugin system; could you try this and let me know if it works for you please? If so I'll update the docs with this more full example.

const TIMESTAMP_OID = "1114";
const TIMESTAMPTZ_OID = "1184";

// Add this plugin to your `--append-plugins`/`appendPlugins` list:
const MyAggregateGroupSpecsPlugin = (builder) => {
  builder.hook("build", (build) => {
    const { pgSql: sql } = build;

    // Add a "truncated to year" aggregrate group-by:
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-year",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('year', ${sqlFrag})`,
    });

    // Push more aggregate group-bys here if you want, e.g. week, month, etc.

    return build;
  });
};

// For CommonJS:
module.exports = MyAggregateGroupSpecsPlugin;
// Or for ESM:
// export default MyAggregateGroupSpecsPlugin;
jeremybradbury commented 3 years ago

@benjie thanks so much for the quick reply.

This project is still a prototype, so I've started with a simple rc file. for simplicity i've added this plugin to the rc file rather than importing it.

Perhaps I am doing something wrong here as it doesn't seem to make any impact on the schema. Below is my entire .postgraphilerc file. I've added all the possible date truncations listed here for date_trunc.

Yes I know how locking micro/milliseconds can be, but so can many other parts of this plugin, and I know how useless millennium and even century are but seemed like we might as well extend the entire PG API on this, rather than require adding them?

Happy to help with a PR for that if we can get this working.... Am I gonna have to comment out year and hour? I did try that, but this plugin seems to do nothing, no matter what I try.

const isProd = process.env.NODE_ENV === "production";

const TIMESTAMP_OID = "1114";
const TIMESTAMPTZ_OID = "1184";

// Add this plugin to your `--append-plugins`/`appendPlugins` list:
const MyAggregateGroupSpecsPlugin = (builder) => {
  builder.hook("build", (build) => {
    const { pgSql: sql } = build;

    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-millennium",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('millennium', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-century",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('century', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-decade",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('decade', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-year",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('year', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-month",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('month', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-week",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('week', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-day",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('day', ${sqlFrag})`,
    });
    // WARNING: grouping by hour or less can easily create long running queries
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-hour",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('hour', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-minute",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('minute', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-second",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('second', ${sqlFrag})`,
    });
    // WARNING: don't use milliseconds unless you know what you're doing
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-milliseconds",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('milliseconds', ${sqlFrag})`,
    });
    // WARNING: don't use microseconds unless you know what you're doing
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-microseconds",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('microseconds', ${sqlFrag})`,
    });

    return build;
  });
};

module.exports = {
  options: {
    plugins: ["@graphile/operation-hooks", "@graphile/pg-pubsub"],
    appendPlugins: [
      "@graphile-contrib/pg-simplify-inflector",
      "postgraphile-plugin-connection-filter",
      "@graphile/pg-aggregates",
      MyAggregateGroupSpecsPlugin,
    ],
    connection: "",
    bodySizeLimit: "25KB",
    schema: ["public"],
    simpleSubscriptions: !isProd,
    retyOnInitFail: true,
    cors: !isProd,
    watch: !isProd,
    dynamicJson: true,
    showErrorStack: "json",
    extendedErrors: !isProd ? ["hint", "detail", "errcode"] : ["hint", "errcode"],
    showErrorStack: !isProd,
    exportSchemaGraphql: !isProd ? "schema.graphql" : undefined,
    enhanceGraphiql: true,
    allowExplain: !isProd,
    enableQueryBatching: !isProd,
  },
};
jeremybradbury commented 3 years ago

Stack notes, in case it helps:

benjie commented 3 years ago

You can’t add a plugin function via appendPlugins in the RC file; you have to pass a path to a file to be required (like you have for the other plugins). Make sure it’s an absolute path, e.g. use __dirname or similar as part of the path.

jeremybradbury commented 3 years ago

OIC that makes a lot of sense for CLI config file. Thanks for the hand holding through plugin development.

here is my groupByPlugin.js file, it's top level (sibling files no subfolder) RN:

const TIMESTAMP_OID = "1114";
const TIMESTAMPTZ_OID = "1184";

// Add this plugin to your `--append-plugins`/`appendPlugins` list:
const aggregateGroupByPlugin = (builder) => {
  builder.hook("build", (build) => {
    const { pgSql: sql } = build;

    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-millennium",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('millennium', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-century",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('century', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-decade",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('decade', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-year",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('year', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-month",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('month', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-week",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('week', ${sqlFrag})`,
    });
    // build.pgAggregateGroupBySpecs.push({
    //   id: "truncated-to-day",
    //   isSuitableType: (pgType) =>
    //     pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
    //   sqlWrap: (sqlFrag) => sql.fragment`date_trunc('day', ${sqlFrag})`,
    // });
    // WARNING: grouping by hour or less can easily create long running queries
    // build.pgAggregateGroupBySpecs.push({
    //   id: "truncated-to-hour",
    //   isSuitableType: (pgType) =>
    //     pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
    //   sqlWrap: (sqlFrag) => sql.fragment`date_trunc('hour', ${sqlFrag})`,
    // });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-minute",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('minute', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-second",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('second', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-milliseconds",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('milliseconds', ${sqlFrag})`,
    });
    build.pgAggregateGroupBySpecs.push({
      id: "truncated-to-microseconds",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('microseconds', ${sqlFrag})`,
    });

    return build;
  });
};

module.exports = aggregateGroupByPlugin;

Here is my rc file:

const isProd = process.env.NODE_ENV === "production";

module.exports = {
  options: {
    plugins: ["@graphile/operation-hooks", "@graphile/pg-pubsub"],
    appendPlugins: [
      "@graphile-contrib/pg-simplify-inflector",
      "postgraphile-plugin-connection-filter",
      "@graphile/pg-aggregates",
      __dirname + "/groupByPlugin.js",
    ],
    connection,
    bodySizeLimit: "25KB",
    schema: ["public"],
    simpleSubscriptions: !isProd,
    retyOnInitFail: true,
    cors: !isProd,
    watch: !isProd,
    dynamicJson: true,
    showErrorStack: "json",
    extendedErrors: !isProd ? ["hint", "detail", "errcode"] : ["hint", "errcode"],
    showErrorStack: !isProd,
    exportSchemaGraphql: !isProd ? "schema.graphql" : undefined,
    enhanceGraphiql: true,
    allowExplain: !isProd,
    enableQueryBatching: !isProd,
  },
};
jeremybradbury commented 3 years ago

If the plugin isn't found or causes an error, is something output? Because when I intentionally use an invalid path, everything runs without warnings.

jeremybradbury commented 3 years ago

I tried this variation too, based on the source document:

  let pgAggregateGroupBySpecs = build.pgAggregateGroupBySpecs || [];
    pgAggregateGroupBySpecs.push({
      id: "truncated-to-millennium",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('millennium', ${sqlFrag})`,
    });
...
    return build.extend(build, {
      pgAggregateGroupBySpecs,
    });
jeremybradbury commented 3 years ago

I made this, I haven't tested it yet, but after I do, I'll submit a PR.

There may be an opinionated reason not to accept it, but if that's the case, the docs do seem to be incorrect as they are... and tracking down append plugin errors (including fully missing files) currently seems fuzzy at best.

I'm just trying to help make things better for everyone, not just my own client/employer.

benjie commented 3 years ago

If the plugin isn't found or causes an error, is something output? Because when I intentionally use an invalid path, everything runs without warnings.

So it turns out you actually can pass the function directly! https://github.com/graphile/postgraphile/blob/e0eeb1a1f768f7c757d36a6fa91045add92f9169/src/postgraphile/cli.ts#L596 Sorry for the misinformation here, it's a long time since I've visited this code!

If the plugin isn't found or causes an error, is something output? Because when I intentionally use an invalid path, everything runs without warnings.

Should be!

Kinda indicates your config file is not actually being used? I wonder if you're editing the wrong file or running postgraphile in the wrong folder? Or maybe you're editing a .postgraphilerc.ts file and then not compiling it? Here's what happens when I run PostGraphile with a postgraphilerc with an appendPlugins that doesn't exist:

benjie@Benjie-Box:~/Dev/DELETEME/jeremybradbury$ cat .postgraphilerc.js 
module.exports = {
  options: {
    appendPlugins: [
      'does-not-exist'
    ]
  }
};
benjie@Benjie-Box:~/Dev/DELETEME/jeremybradbury$ postgraphile -c test
Failed to load plugin 'does-not-exist'
/home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/build/postgraphile/cli.js:287
            throw e;
            ^

Error: Cannot find module 'does-not-exist'
Require stack:
- /home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/build/postgraphile/cli.js
- /home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/cli.js
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:927:15)
    at Function.Module._load (node:internal/modules/cjs/loader:772:27)
    at Module.require (node:internal/modules/cjs/loader:999:19)
    at require (node:internal/modules/cjs/helpers:93:18)
    at /home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/build/postgraphile/cli.js:282:20
    at Array.map (<anonymous>)
    at loadPlugins (/home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/build/postgraphile/cli.js:274:18)
    at Object.<anonymous> (/home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/build/postgraphile/cli.js:342:23)
    at Module._compile (node:internal/modules/cjs/loader:1095:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1124:10) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/build/postgraphile/cli.js',
    '/home/benjie/.nvm/versions/node/v16.4.0/lib/node_modules/postgraphile/cli.js'
  ]
}

and tracking down append plugin errors (including fully missing files) currently seems fuzzy at best.

I almost never use .postgraphilerc.js and it's being removed/replaced in the next version with something better, it's been deprecated for a long time. That said the code seems to indicate that an error should be thrown if requiring fails, or if the plugin is of the wrong type, and that certainly seems to be the case in the quick test I did above.

I generally recommend that people using a .postgraphilerc.js move to using PostGraphile in library mode, it's just a few more lines of code and makes debugging a lot more straightforward because it's much more explicit what you are/are not running.

I tried this variation too, based on the source document:

That one should definitely be throwing an error because there should be a conflict on the pgAggregateGroupBySpecs name.

Also if you just add throw new Error("TESTING"); to the root of the file you should see that error aborting startup and exiting node.

There may be an opinionated reason not to accept it

There's a couple:

Hopefully we can make it sufficiently easy to add your own group by (maybe even with your code as an example in the README) that we don't need to add it to core. And of course you could always create a graphile-aggregate-more-groups or similar plugin that adds more groupings so people can --append-plugins @graphile/pg-aggregates,graphile-aggregate-more-groups.

jeremybradbury commented 3 years ago

Kinda indicates your config file is not actually being used?

It's clearly using my url, and all the plugins I specified in .postgraphilerc.js so it's not missing the config file entirely. It's just not loading my plugin, or it doesn't work as expected.

When I get the filename wrong, I get no errors, it just runs as if it's not even trying to load it, yet aggregates and my db connection work.

I generally recommend that people using a .postgraphilerc.js move to using PostGraphile in library mode, it's just a few more lines of code and makes debugging a lot more straightforward because it's much more explicit what you are/are not running.

I guess that's my only option at this point. I don't need express/restify on all my services, so they usually start as minimal as possible. Rather than adding endpoints here, we have a primary rest API server... This is also still a prototype. That's my rationale: it still works and start simple. But I'm seeing, that now half of that is no longer true.

That one should definitely be throwing an error...

Yeah that's kinda what I was going for, just get any response.

I don't want to add too many enum values to people's schemas, especially if they don't need that feature.

First, I completely understand and respect your view. Second, I disagree. I feel like the goal of PostGraphile is to extend Postgres into GraphQL and modularity is key. However, plugins (the modular part) should be API complete, before being extendable. The purpose of a plugin for a library like this, doesn't seem to be making a plugin system for a plugin system. Although that's handy it doesn't seem like it should be required for typical usage.

I don't want to add "ambiguous" group by values, e.g. "week",

yes, which is why i just extended the postgres api and if ppl want something else, they can extend the plugin as you suggest i do.

adding hour, day and year seems weird without the others.

I agree, why did you add hour at all? If you had month, week, day and year (ideally quarter too) in there over hour & day, this thread would have never happened.

Thanks again for your time sharing with me and swapping opinions. I'll forgo the rc file and report back.

benjie commented 3 years ago

It's clearly using my url, and all the plugins I specified in .postgraphilerc.js so it's not missing the config file entirely. It's just not loading my plugin, or it doesn't work as expected. When I get the filename wrong, I get no errors, it just runs as if it's not even trying to load it, yet aggregates and my db connection work.

Try editing one of the things out of your config file that you know it's using, e.g. one of the plugins you rely on, and then restart PostGraphile and ensure that that functionality has disappeared. Because honestly, I think it's not using that particular config file with the bad path in it - if you can show me in the code how it's possible to not throw an error when the path is wrong I'd be very interested to see (and fix) it.

it doesn't seem like it should be required for typical usage.

You're the only person who's requested this so far, so I'd be careful inferring what you're doing is necessarily typical usage :wink:

I agree, why did you add hour at all? If you had month, week, day and year (ideally quarter too) in there over hour & day, this thread would have never happened.

I added the fields that I needed for my usecase at the time which was per-day and per-hour. I didn't need the others and I saw week and month as problematic (as previously mentioned) so I wanted to avoid the complexity of adding them and then having people complain they work the wrong way - instead add them yourself and have them work the way you design. I considered not even adding day/hour to core but figured it's necessary for feature discovery. People might not even need time-based groupings but groupings on some other metric altogether; adding more groupings "for completeness" makes it harder to discover the other values of the enum because they get lost in a longer list.

Thanks again for your time sharing with me and swapping opinions.

Always good to have me surface the reasoning behind my decisions :wink:

jeremybradbury commented 3 years ago

Try editing one of the things out of your config file that you know it's using, e.g. one of the plugins you rely on, and then restart PostGraphile and ensure that that functionality has disappeared. Because honestly, I think it's not using that particular config file with the bad path in it - if you can show me in the code how it's possible to not throw an error when the path is wrong I'd be very interested to see (and fix) it.

this was exactly right, it has somehow cached the data in this file and isn't reading further changes.... I have no idea how to fix it other than not using an rc/config/ini file to optionally hold envs/flags, as many other cli tools do.

For example, I was able to make 2 really long node scripts with all my flags (one for local, one for prod), as this wouldn't need to require any Node based RESTful libraries or any other bloat this project doesn't need. However, an absolute path inside a json file is cumbersome at best.

I was hoping to see some sort of replacement for the RC file that didn't require being inserted into Express or Restify that I clearly don't need. From the sounds of it, there isn't, and middleware-only is where this tool is headed. Having to import & -configure express, just for one middleware, with 2 routes seems rather silly.

When you need more it makes sense, but GraphQL and Microservices don't always have to be one giant schema (although that's all the rage). It also works well to have smaller focused GraphQL services too, like this example of an internal reporting server where the only auth is "are you on a whitelisted IP?" aka "are you in the office or on the VPN?" style private services.

You're the only person who's requested this so far, so I'd be careful inferring what you're doing is necessarily typical usage 😉

This is issue number 20 for this repo I see. So, I would repeat back the same remark to you "so I'd be careful inferring" about your lack of useful data, at least in the context of this project 😉

Let me share more context for what I meant "typical aggregation/reporting usage in Postgres (or similar RDBs), based on my own firsthand experience" (like you, I only have the sample group I have been given).

Many clients have asked me for reporting/charts with sum()s grouped by day, week, month, quarter & year. None of them have complained 52 weeks/year are the wrong grouping (no matter which day you start it always includes the same 7) or that 12 months are wrong. Some have asked me to instead use 28 day months & 13 month years (like credit card companies do). Which, above, I modeled a manual way to truncate dates both ways (12 & 13 months) without the use of the date_trunc function (with somewhat precise numbers).

For reporting purposes, most users don't seem to care about the details of groupings too much or if daylight savings was considered in the equation (I've never once seen it used for aggregation calculation). In fact, many will say 13 * 28 = 364 is good enough, when each year is actually 364.25 days (the 1/4 day is added to February every 4th year because that's how we decided to mark time), and that's 1:4 of being under by a whole day and 3:4 chance of being "right" according to how we measure time.

Asking someone for an exact average doesn't make sense, neither does getting too particular about time groupings. Even in your case with hour and day... 24 hours is not the length of every day. even if we claim it is exactly true on every day, some days have 23 and other have 25, at least in the US... this is a technicality and doesn't impact the data groupings unless we're comparing the same hour range, on every day of a year, even then it's only off twice for obvious reasons, and can be adjusted.

...I saw week and month as problematic (as previously mentioned)

Yeah I think that's the main difference in our opinions. I share/default all my opinions with Postgres. If they made the date_trunc function work this way and we just extended it, how can we be wrong? We cannot be, only "incomplete" by someone else's opinion (which is what started this thread, and is clearly easy to say no to).

I considered not even adding day/hour to core but figured it's necessary for feature discovery. People might not even need time-based groupings but groupings on some other metric altogether; adding more groupings "for completeness" makes it harder to discover the other values of the enum because they get lost in a longer list.

Yes developer UX is a concern here. I went overboard using all of them, kinda to make a point: there's a balance. I feel a handful of the most common (year, quarter, month, week, day... possibly even hour if used with paging) are better in a plugin like this, than "two I once needed for a project" (not that that's ever a bad reason to make it into open source).

Always good to have me surface the reasoning behind my decisions

You're a very talented and capable human. You don't rally have to justify any of your decisions to me. However, I'm glad to hear your rationale and possibly let it inform my future opinions.

Hopefully this will be closed soon. Still working on converting this to a middleware project to I can include a local plugin.

jeremybradbury commented 3 years ago

I was able to get this working:

const SevenAggregateGroupSpecsPlugin = (builder) => {
  builder.hook("build", (build) => {
    const { pgSql: sql } = build;
    build.pgAggregateGroupBySpecs = [{
      id: "truncated-to-year",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('year', ${sqlFrag})`,
    },{
      id: "truncated-to-quarter",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('quarter', ${sqlFrag})`,
    },{
      id: "truncated-to-month",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('month', ${sqlFrag})`,
    },{
      id: "truncated-to-week",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('week', ${sqlFrag})`,
    },{
      id: "truncated-to-day",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('day', ${sqlFrag})`,
    },{
      id: "truncated-to-hour",
      isSuitableType: (pgType) =>
        pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
      sqlWrap: (sqlFrag) => sql.fragment`date_trunc('hour', ${sqlFrag})`,
    }];
    return build;
  });
};

....

options: {
    plugins: ["@graphile/operation-hooks", "@graphile/pg-pubsub"],
    appendPlugins: [
      "@graphile-contrib/pg-simplify-inflector",
      "postgraphile-plugin-connection-filter",
      "@graphile/pg-aggregates",
      SevenAggregateGroupSpecsPlugin,
    ],

Looks like this with 7 options in descending order, per date. Perhaps too much for some, but it works perfect for charting, and nothing more costly was added.

seven

If anyone else might need this... it's here.

jeremybradbury commented 3 years ago

dang... ima leave this closed but that actually just listed queries that don't seem to work.

they only work alone and not along with the "having" clauses =[

the way of extending this is a little more complex than is shown in the docs, and perhaps not everything was taken into account, my fork seems to work closer to what I'd expect

benjie commented 3 years ago

this was exactly right, it has somehow cached the data in this file and isn't reading further changes.... I have no idea how to fix it other than not using an rc/config/ini file to optionally hold envs/flags, as many other cli tools do.

Pretty sure it's reading a different file - there's no caching.

I was hoping to see some sort of replacement for the RC file that didn't require being inserted into Express or Restify that I clearly don't need. From the sounds of it, there isn't, and middleware-only is where this tool is headed. Having to import & -configure express, just for one middleware, with 2 routes seems rather silly.

You don't have to use express/restify, you can just use the built in Node.js webserver which is exactly what the CLI uses and is also the second example code block on our library usage page. https://www.graphile.org/postgraphile/usage-library/ - zero bloat. In fact, because it require()s less code than the CLI does you're actually reducing bloat.

This is issue number 20 for this repo I see. So, I would repeat back the same remark to you "so I'd be careful inferring" about your lack of useful data, at least in the context of this project wink

One of the main reasons that I built PostGraphile as a plugin-based system is so that users could customise it to their needs without having to affect all the other users of the library. This plugin gets 600 downloads per week and you're the only person that's requested this feature so far.

Looks like this with 7 options in descending order, per date. Perhaps too much for some, but it works perfect for charting, and nothing more costly was added.

That's 7 options per timestamp field, per table though, right? This soon multiplies to making sizeable diffs to the schema.graphql file. With just 20 tables (less than the vast majority of schemas I've worked on) and an average of 3 timestamps per table (created_at and updated_at are very standard and then many tables will have other timestamp inputs including common ones such as archived_at, deleted_at, published_at, sent_at, etc and of course other meaningful timestamps) that's 420 new enum values. That's a lot of noise for the vast majority of users who won't use this.

I was able to get this working:

Excellent! I'd probably do something like:

const isTimestamp =  (pgType) =>
  pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID;

const tsTruncateSpec = (sql, interval) => ({
  id: `truncated-to-${interval}`,
  isSuitableType: isTimestamp,
  sqlWrap: (sqlFrag) => sql.fragment`date_trunc(${sql.literal(interval)}, ${sqlFrag})`,
});

const SevenAggregateGroupSpecsPlugin = (builder) => {
  builder.hook("build", (build) => {
    const { pgSql: sql } = build;

    build.pgAggregateGroupBySpecs = [
      // Copy existing specs, but remove the ones we're replacing
      ...build.pgAggregateGroupBySpecs.filter(
        spec => !["truncated-to-day", "truncated-to-hour"].includes(spec.id)
      ),

      // Add our timestamp specs
      tsTruncateSpec(sql, "year"),
      tsTruncateSpec(sql, "month"),
      tsTruncateSpec(sql, "week"),
      tsTruncateSpec(sql, "day"),
      tsTruncateSpec(sql, "hour"),
    ];

    return build;
  });
};

they only work alone and not along with the "having" clauses =[

The group derivatives aren't added to having clauses currently - I think only regular attributes and computed columns are. I'm not sure what value there is in adding the group derivatives in general, e.g. having ( average(date_trunc('month', timestamp)) > '2021-01-01') seems less useful than using the value directly: having (average(timestamp) > '2021-01-01')? Could you elaborate? The only aggregate that seems useful to me over truncated timestamps would be count, I think... I can't think of other useful ones right now.

Or maybe I've misinterpreted you and you mean there's an error?

jeremybradbury commented 3 years ago

Or maybe I've misinterpreted you and you mean there's an error?

kinda, i just get empty results combining having with group by using the "extend the plugin method" like it wasn't adding the "having" part to the query, to which I confirmed with the explain mode on, that it was not. it's fine, my fork of the plugin works for me. I just need to keep track of upstream changes which should rarely conflict.

I was most interested in simply paging data by year/week/month segments (sorta like you have above), but these filters need to work as well. here's the example query I got stuck on.

query AggRacesGroupMonth {
  racingLogs {
    groupedAggregates(
      groupBy: TIMESTAMP_TRUNCATED_TO_MONTH
      having: {average: {racers: {greaterThan: ""}}}
    ) {
      keys
      average {
        racers
      }
      sum {
        racers
      }
      max {
        viewers
      }
      min {
        viewers
      }
    }
  }
}

the above returns 4 chartable datasets, really 2 pairs sum/avg and min/max. they could be used like this for 2 views or we could add more field data, to make 4 views, with the same chart template.

For more context, I'm using this with ChartsJS and Svelte ( and code generator) to make a simple/reusable internal reporting app, that we pretty much just need to format how the charts look and how many show on each page etc and swap from imports to read replica when it's ready.

That's 7 options per timestamp field, per table though, right?

How is 3 for each helpful now? my only point was hour and day are an incomplete list.

if you're gonna complain about the list being too long, perhaps you don't actually need an aggregate plugin?

if you need these, why would you want to extend a plugin that's already made to do aggregations.

and further, as it is now, it seems more like schema bloat and attack surface than if it was API complete, because then even if it added more fields, they would have a purpose/use.

why not either:

the code is done, it's basically just your code pasted from this thread. I have no emotional attachment to it, other than it makes my project work. because of that last reason, i'll try to keep my fork in sync, since its public, perhaps it helps someone else.

I have nothing but respect for you. there was nothing personal or implied above. just some debate/discussion on how/why things are done. I understand your decisions/opinions and respect that you stand behind them.

benjie commented 3 years ago

When you say "your fork" are you referring to this PR? https://github.com/graphile/pg-aggregates/pull/21/files That should be identical to using the plugin... Or do you have a different fork?

jeremybradbury commented 3 years ago

yes, that's what i thought too but i kept getting empty arrays, only when combining with a having clause that returns results, from the same database with one folder using the extend-a-plugin method and the other using my fork worked fine.

however, its possible I may be running into a different bug.

If i can get my project working without a submodule from my fork, it makes my life easier. I'm working on getting this production ready and up on test server. If I can get it working, I will for sure update this thread.

jeremybradbury commented 3 years ago

it actually looks more like a timeout issue (perhaps cache invalidation too)... i have some long running queries on a table with several million rows. they take 8-12ish seconds when I run my integration tests locally. I'm using a simple jest runner and moved the test timeouts to 30 seconds, but ideally in production they'll be 0-5 or 1-10, worst case.

seems like occasionally the requests pass the 15 second barrier... perhaps something fails on a default timeout of 5 seconds with a 3x grace period or some such thing?

my hope is that I won't have the same issues, using Linux instead of Windows. Sometimes a C++ or even js dependency isn't well tested on all platforms and causes odd behavior with certain combinations x64/x86, nodejs & dep version numbers even the occasional WSL2 bug/issue.

This does indeed work, both ways will hit intermittent empties, and often when a query comes back empty it gets stuck that way until a service restart, even if removing the grouping part of the query, leaving the having part, still shows results.

This is what I ended up using, which is 6 total, including the timestamp itself, 4 added: year, quarter, week, month & 1 removed: hour, 1 kept: day.

const TIMESTAMP_OID = "1114";
const TIMESTAMPTZ_OID = "1184";

const options = {
    plugins: ["@graphile/operation-hooks", "@graphile/pg-pubsub"],
    appendPlugins: [
      "@graphile-contrib/pg-simplify-inflector",
      "postgraphile-plugin-connection-filter",
      "@graphile/pg-aggregates",
      (builder) => { // AggregateGroupSpecs Overrides
        builder.hook("build", (build) => {
          const { pgSql: sql } = build;
          build.pgAggregateGroupBySpecs = [{
            id: "truncated-to-year",
            isSuitableType: (pgType) =>
              pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
            sqlWrap: (sqlFrag) => sql.fragment`date_trunc('year', ${sqlFrag})`,
          },{
            id: "truncated-to-quarter",
            isSuitableType: (pgType) =>
              pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
            sqlWrap: (sqlFrag) => sql.fragment`date_trunc('quarter', ${sqlFrag})`,
          },{
            id: "truncated-to-month",
            isSuitableType: (pgType) =>
              pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
            sqlWrap: (sqlFrag) => sql.fragment`date_trunc('month', ${sqlFrag})`,
          },{
            id: "truncated-to-week",
            isSuitableType: (pgType) =>
              pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
            sqlWrap: (sqlFrag) => sql.fragment`date_trunc('week', ${sqlFrag})`,
          },{
            id: "truncated-to-day",
            isSuitableType: (pgType) =>
              pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,
            sqlWrap: (sqlFrag) => sql.fragment`date_trunc('day', ${sqlFrag})`,
          }];
          return build;
        })
     },
  ],
 connection...
};

I left the other plugins to show context and compatibility but only the function marked // AggregateGroupSpecs Overrides and the pg-aggregates are needed to get the desired result (the ask of this issue).

jeremybradbury commented 3 years ago

here is another example, since you mentioned, this feature isn't used/discussed much.

  AggregateGroupSpecsRemove(builder) { 
    builder.hook("build", (build) => {
      const { pgSql: sql } = build;
      build.pgAggregateGroupBySpecs = [];
      return build;
     });
  };
jeremybradbury commented 3 years ago

@benjie

Here is a fairly* sharable sample of what I was building, in case you still think this is a minority use case. I built a mock from PostGraphile and plugins, then randomize the data and change some small details.

* You'll have to add a space or line to trigger a render which loads my CDN scripts (extracted from a working project)

https://svelte.dev/repl/96fff1ea92ec46d2b465a0830d958782?version=3.42.1

benjie commented 3 years ago

Thanks for clarification - I've taken the example plugin code from above and tested it, added comments, and incorporated it into the documentation here: https://github.com/graphile/pg-aggregates/pull/24

Regarding svelte.dev: nice example! :raised_hands: