jeremydaly / data-api-client

A "DocumentClient" for the Amazon Aurora Serverless Data API
MIT License
439 stars 61 forks source link

Dealing with the 1 MB limit #59

Open lastmjs opened 3 years ago

lastmjs commented 3 years ago

Amazing library!

As you know, there is a 1 MB limit for Data API results. I'm just working through how to deal with this generically for my queries (I'm implementing a GraphQL API, so I'm constructing SQL queries generically from GraphQL queries). I need to somehow split up any query given to the Data API into chunks <= 1 MB in size.

My question is, how are people dealing with this? And would it be appropriate to somehow build this chunking into the data-api-client? Not having to deal with this limitation directly would be an amazing feat of abstraction for this library.

Thanks!

ghost commented 3 years ago

I would love to see this abstracted as well! +1

gnestor commented 3 years ago

I just dealt with this by doing the following:

const queryDatabase = async (query, parameters = {}) => {
  try {
    const { records } = await db.query(query, parameters)
    return records
  } catch (error) {
    console.error(error.message)
    if (error.message.includes('Database returned more than the allowed response size limit')) {
      const getCount = async (_query) => {
        const query = _query.replace(/SELECT\s(.+?)\s(FROM|WHERE)/is, 'SELECT COUNT(*) $2')
        const {
          records: [{ count }],
        } = await db.query(query, parameters)
        return count
      }
      let result = []
      // Get result count
      const count = await getCount(query)
      // Define limit based on your average record size
      const limit = 50
      // Use limits and offsets to page through the query
      for (let page = 1; page <= Math.ceil(count / limit); page++) {
        const limitQuery = query.replace(/;?$/, `\nLIMIT ${limit} OFFSET ${(page - 1) * limit};`)
        const { records } = await db.query(limitQuery, parameters)
        result = [...result, ...records]
      }
      return result
    }
  }
}

It you're doing a batch query that passes parameters as nested arrays, then you will need to handle that for each page query.

Lane commented 3 years ago

thanks @gnestor! one small note in case someone else ends up here:

my query contained a GROUP BY clause, which doesn't return the correct number of rows with the snippet above, so i changed the following:

const getCount = async (_query) => {
  // old: const query = _query.replace(/SELECT\s(.+?)\s(FROM|WHERE)/is, 'SELECT COUNT(*) $2')
  const query = `SELECT COUNT(*) FROM (${_query}) sub`; // wrap the original query and count the number of results
  const {
    records: [{ count }],
  } = await db.query(query, parameters)
  return count
}

other than that, it works great!