launchql / pgsql-parser

PostgreSQL Query Parser for Node.js
MIT License
129 stars 23 forks source link

transformArrays slow for large queries #97

Closed benasher44 closed 1 year ago

benasher44 commented 1 year ago

Hi there! Recently we have been using pgsql-parser on larger queries, and we've found a bottleneck in transformArrays. I've tried rewriting it a few different ways, and it hasn't made much difference. For example. I've tried something like this, which makes some tradeoffs that I'm not sure are suitable outside of our usecase (doesn't copy and reuses obj.List.items):


export const transformArrays = (obj, context) => {
  // Handle null, undefined, primitives, and Dates
  if (obj == null || typeof obj !== 'object' || obj instanceof Date) {
    return obj;
  }

  // must be a non-Date object at this point

  // Handle Array
  if (obj instanceof Array) {
    for (let i = 0, len = obj.length; i < len; i++) {
      obj[i] = transformArrays(obj[i], context);
    }
    return obj;
  }

  // it's a list, transform it and its items
  if (obj.List !== undefined) {
    for (let i = 0, len = obj.List.items.length; i < len; i++) {
      obj.List.items[i] = transformArrays(obj.List.items[i], context);
    }
    return obj.List.items;
  }

  // it's a regular object, recurse to find more Lists
  for (const attr in obj) {
    obj[attr] = transformArrays(obj[attr], context);
  }
  return obj;
};

Despite a small rewrite that creates fewer copies, CPU traces still show a lot of time here. I think ultimately, it comes down to time cost to traverse all object attributes of all objects in the AST for large ASTs.

So I thought I would start a thread to discuss. It seems like the goal is to erase Lists, which is a goal I definitely appreciate :). That said, Lists still leak through in RangeFunctions and SelectStmt.valuesLists (not sure how since transformArrays seems comprehensive enough). Given these issues, I wanted to ask if there was a way to push this closer to where the initial AST object is built, so then the tree doesn't have to be reprocessed here (and might somehow plug the holes along the way).

Thanks for your consideration!

benasher44 commented 1 year ago

Actually I just realized this is only used in deparse, in which case I don't really understand what this is for. Initially, I thought maybe the parse step adds lots from List from postgres C, which is nice to transform to arrays. But, this converts List before going back to SQL. When is that necessary (not implying it isn't necessary, just want to understand)? If it's not always needed, maybe we can add a flag to opt out of the preparse step (i'd be happy to open a PR)?

benasher44 commented 1 year ago

Oh I think I see now. It makes the deparsing code easier to write