neondatabase / serverless

Connect to Neon PostgreSQL from serverless/worker/edge functions
https://www.npmjs.com/package/@neondatabase/serverless
MIT License
321 stars 13 forks source link

Add a way to abort a hung connection when using experimental neon() API #29

Closed skybrian closed 11 months ago

skybrian commented 1 year ago

Sometimes the fetch() API used by the experimental neon() API just hangs. It would be nice to have a way to get it to abort.

For the fetch() API, an AbortController can be used to create an AbortSignal that's passed to fetch(). This would require a new option to the neon() function or the query object it creates to thread through the AbortSignal.

Steps to reproduce

Hanging is not easily reproducible I'm afraid.

See below for the code I'm running; it's pretty basic. I'm passing in a direct connection URL. Normally it works, but when I'm unlucky, it hangs. This started happening when my database server was shut down, and it didn't seem to be starting on its own. I used the Neon console run a query and start the server, but it made no difference.

Expected result

It seems like it should either time out or start working eventually.

Actual result

Hangs indefinitely. I had to kill the process and re-run it to see it work again.

Environment

Here's my code.

import { config } from 'dotenv';
import { neon } from '@neondatabase/serverless';

config();

const query = (() => {
    const url = process.env.DATABASE_URL;
    if (url == null) throw "need to set DATABASE_URL environment variable";
    return neon(url);
  })();

console.log(await query("select id, symbol, count from counters order by id"));
jawj commented 1 year ago

Thanks @skybrian. I think AbortController might be too new a feature (Node 15) to build in our own timeout mechanism using it, but I'll look at allowing one to be passed in via the options to fetch.

wscotten commented 1 year ago

+1. This happens every time my compute endpoint goes idle. I'm forced to wait ~2 minutes for a function to finally timeout. Would be nice if there is some retry logic or something to handle this issue.

Seems like the only way to work around for the time being is to keep your compute endpoint always warm...

skybrian commented 1 year ago

It's inconsistent for me. I've noticed that sometimes it eventually finishes, but it's many seconds after the compute node has started up and a process on a different server was able to connect.

jawj commented 1 year ago

OK — you certainly shouldn't be seeing a hang or long pause when the endpoint goes idle, so we'll investigate this on the server side.

In any case, as of driver version 0.5, there's now a fetchOptions option to neon(...) and to the query function it returns (when used as an ordinary, non-tagged-template function). Anything you provide here is merged with the options to fetch, so you can add a signal parameter.

For example, to retry on timeout you might do something like the example below. Note, of course, that a query may time out even if it reached the database, so be careful to use this only for idempotent queries.

function sqlWithRetries(sql: ReturnType<typeof neon>, timeoutMs: number, attempts = 3) {
  return async function (strings: TemplateStringsArray, ...params: any[]) {
    // reassemble template string
    let query = '';
    for (let i = 0; i < strings.length; i++) {
      query += strings[i];
      if (i < params.length) query += '$' + (i + 1);
    }
    // run query with timeout and retries
    for (let i = 1; ; i++) {
      const abortController = new AbortController();
      const timeout = setTimeout(() => abortController.abort('fetch timed out'), timeoutMs);

      try {
        const { signal } = abortController;
        const result = await sql(query, params, { fetchOptions: { signal } });
        return result;

      } catch (err: any) {
        const timedOut = err.sourceError && err.sourceError instanceof DOMException && err.sourceError.name === 'AbortError';
        if (!timedOut || i >= attempts) throw err;

      } finally {
        clearTimeout(timeout);
      }
    }
  }
}

const sql = neon(process.env.DATABASE_URL);
const sqlRetry = sqlWithRetries(sql, 10000);
await sqlRetry`SELECT ${'did this time out?'} AS str`;
skybrian commented 1 year ago

I sometimes get "connection refused" errors when connecting from Deno Deploy at startup. It tends to happen doing a cold boot. I added a retry and it didn't help. However, it works fine after reloading the page, which restarts the process.

1.1.1 wake: NeonDbError: error connecting to server: Connection refused (os error 111)

Caused by: Connection refused (os error 111) at https://esm.sh/v128/@neondatabase/serverless@0.5.0/denonext/serverless.mjs:3:54476 at eventLoopTick (ext:core/01_core.js:183:11) at async query (file:///src/lib/database.ts:28:10) at async wake (file:///src/lib/database.ts:43:20) at async file:///src/main.ts:16:3

Maybe there's a better place to report incompletely diagnosed observations like this, assuming it's useful? It might be something Deno Deploy is doing with their sandboxing or network, rather than anything to do with Neon.

jawj commented 11 months ago

@skybrian Do I take it that (like in #36) the network issues have resolved, or is this something you're still seeing?

skybrian commented 11 months ago

Yes, my connection issues went away. I didn’t test adding fetch options, but I have no reason to believe it wouldn’t work, so this seems fine to close.