solana-labs / solana-web3.js

Solana JavaScript SDK
https://solana-labs.github.io/solana-web3.js
MIT License
2.09k stars 841 forks source link

_handleServerNotification on Connection.ts has an unhandled error. #3053

Open jpablogn opened 1 month ago

jpablogn commented 1 month ago

My app uses connect.onAccountChange() and connect.removeAccountChangeListener() and is keeps open 100-200 subscritions.

Main libraries:

The app crashes when it tries to use subscribe/unsubscribe using onAccountChange()/removeAccountChangeListener() and the Websocket is not with readyState===1.

Fail when trying to subscribe (it goes in an infinite loop with all the accounts handled at that moment):

accountSubscribe error for argument [
  'F3TuARNBKV6U7uA4DvLZUUgmcAAEtpDeMyC36Zb1CPnR',
  { encoding: 'base64', commitment: 'confirmed' }
] Tried to call a JSON-RPC method `accountSubscribe` but the socket was not `CONNECTING` or `OPEN` (`readyState` was 3)

Fail when trying to unsubscribe (it goes in an infinite loop with all the accounts handled at that moment):

accountUnsubscribe error: Tried to call a JSON-RPC method `accountUnsubscribe` but the socket was not `CONNECTING` or `OPEN` (`readyState` was 2)

And finally crashes:

<--- Last few GCs --->

[1996:0000021E09A48EE0] 19434569 ms: Mark-Compact 4047.4 (4134.3) -> 4032.6 (4135.3) MB, 1414.30 / 0.00 ms  (average mu = 0.253, current mu = 0.245) allocation failure; scavenge might not succeed
[1996:0000021E09A48EE0] 19436455 ms: Mark-Compact 4048.4 (4135.3) -> 4033.8 (4136.6) MB, 1735.41 / 0.00 ms  (average mu = 0.169, current mu = 0.080) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 00007FF6430BC81B node::SetCppgcReference+17979
 2: 00007FF643026674 DSA_meth_get_flags+89316
 3: 00007FF643AA4871 v8::Isolate::ReportExternalAllocationLimitReached+65
 4: 00007FF643A8DFC8 v8::Function::Experimental_IsNopFunction+1336
 5: 00007FF6438EFA70 v8::Platform::SystemClockTimeMillis+659328
 6: 00007FF6438FBCF3 v8::Platform::SystemClockTimeMillis+709123
 7: 00007FF6438F9654 v8::Platform::SystemClockTimeMillis+699236
 8: 00007FF6438EC790 v8::Platform::SystemClockTimeMillis+646304
 9: 00007FF643901E0A v8::Platform::SystemClockTimeMillis+733978
10: 00007FF643902687 v8::Platform::SystemClockTimeMillis+736151
11: 00007FF643910F7F v8::Platform::SystemClockTimeMillis+795791
12: 00007FF6435D1CA5 v8::CodeEvent::GetFunctionName+116773

As far as WebSocket.readyState is not visible, I have tried to catch the error in different ways but I can't catch it.

Option 1: try-catch

const removeListener = async (id:number) =>{

    let ok:boolean = false;
    let n_times:number = 1;

    do{

        try{
            await connection.removeAccountChangeListener(id);
            ok = true;
        }catch(error){
            ok = false;
        }            

        if (!ok){
            await delay(1000); // custom function to delay 1s
            console.log(`Try: ${n_times}, Nop: ${id}`);
            n_times++;
        }else{
            console.log(`Try: ${n_times}, Disconnected: ${id}`);
        }

    }while(!ok);

}

Option 2: catch chained

const removeListener = async (id:number) =>{

    let ok:boolean = false;
    let n_times:number = 1;

    do{
        ok = true;
        await connection.removeAccountChangeListener(id).catch(error){
            ok = false;
        }            

        if (!ok){
            await delay(1000); // custom function to delay 1s
            console.log(`Try: ${n_times}, Nop: ${id}`);
            n_times++;
        }else{
            console.log(`Try: ${n_times}, Disconnected: ${id}`);
        }

    }while(!ok);

}

These errors are in file "rpc-websocket.ts" call and notify reject funcions:

      new Error(
        'Tried to call a JSON-RPC method `' +
          args[0] +
          '` but the socket was not `CONNECTING` or `OPEN` (`readyState` was ' +
          readyState +
          ')',
      ),

But I think the problem could came from 'connection.ts' which has this piece of code with a no controlled catch, because I can't catch the error:

  /**
   * @internal
   */
  private _handleServerNotification<
    TCallback extends SubscriptionConfig['callback'],
  >(
    serverSubscriptionId: ServerSubscriptionId,
    callbackArgs: Parameters<TCallback>,
  ): void {
    const callbacks =
      this._subscriptionCallbacksByServerSubscriptionId[serverSubscriptionId];
    if (callbacks === undefined) {
      return;
    }
    callbacks.forEach(cb => {
      try {
        cb(
          // I failed to find a way to convince TypeScript that `cb` is of type
          // `TCallback` which is certainly compatible with `Parameters<TCallback>`.
          // See https://github.com/microsoft/TypeScript/issues/47615
          // @ts-ignore
          ...callbackArgs,
        );
      } catch (e) {
        console.error(e);
      }
    });
  }

Few questions open en Solana Stack Exchange: https://solana.stackexchange.com/questions/15643/accountunsubscribe-and-accountsubscribe-error-tried-to-call-a-json-rpc-method-w https://solana.stackexchange.com/questions/15698/solana-web3-js-it-is-not-possible-to-detect-websocket-error

Could it be a bug? How can I handle or catch this error? How can I check websocket.readyStatus to avoid subscribe/unsubscribe when it is not in OPEN or CONNECTING?

steveluscher commented 1 month ago

Subscribe errors are caught here: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6255-L6270

Unsubscribe errors are caught here: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6310-L6322

In neither place are these errors surfaced to your application.

It's possible that a check for readyState could be added to isCurrentConnectionStillActive(). Could you try that, and send a PR if it works?

All things considered, this subscriptions API and implementation are terminally flawed. We've completely rewritten it in 2.0, and I encourage you to migrate your app now. https://x.com/anza_xyz/status/1819093864394764674

jpablogn commented 1 month ago

Yes, in fact that is the problem, but I don't know how to solve it.

If the program arrives to the line where is written "// TODO: Maybe add an 'errored' state or a retry limit?" (lines: 6.316 or 6.265) it goes to an infinite loop that crash the javascript stack.

I have modified rpc-websocket.ts to have visibility of readyState like this:

readyState(){
    // websocket CONNECTING (0) or OPEN (1)
    return ((this.underlyingSocket?.readyState === 1) || (this.underlyingSocket?.readyState === 0));
}

I have tried three different things without success: Case 1: Include this piece of code in line 6.193:

if(!this._rpcWebSocket.readyState()){
      this._rpcWebSocket.connect();
      return;
}

Case 2. Add it to isCurrentConnectionStillActive() like this:

const activeWebSocketGeneration = this._rpcWebSocketGeneration;
    const isCurrentConnectionStillActive = () => {
      return ((activeWebSocketGeneration === this._rpcWebSocketGeneration) && (this._rpcWebSocket.readyState()));
 };

Case 3. Add this piece of code in lines 6.316 and 6.265:

if (!this._rpcWebSocket.readyState()){
   return;
}

Running the test npm run test:test-with-live-validator gives following results:

  1. Source code (without changes):
    490 passing (1m)
    3 pending
    7 failing
  2. Case 1:
    310 passing (1m)
    3 pending
    62 failing
  3. Case 2:
    476 passing (1m)
    3 pending
    21 failing
  4. Case 3:
    474 passing (1m)
    3 pending
    23 failing

Running npm run test:unit:node gives following results:

  1. Source code (without changes):
    471 passing (1s)
  2. Case 1:
    Tests not finished
  3. Case 2:
    Test not finished
  4. Case 3:
    Test not finished

Adding a return just in line 6311 passes all test successfully but I don't know the real effect (as far as the original code also passes al of them but fails in produccion when doing a websocket call when readyState != 1):

if (e instanceof Error) {
   console.error(`${unsubscribeMethod} error:`, e.message);
   return;
}

I will test it over a real server because I don't have the knowledge to do the test.

By the way I think lines 6.297-6.299 are repeated and can be deleted: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6297-L6304

Using a totaly refactored library means investing lots of hours in learning how it works (take into account that there are not many examples published to use as reference), for a big project may have sense, for an small one may be it is faster to do your own small library instead of trying to learn how such a complex one works (talking from my ignorant point of view).

Thank you.

jpablogn commented 1 month ago

This in not working fine in my code, it just gives back undefined.... :)

readyState(){
    // websocket CONNECTING (0) or OPEN (1)
    return this.underlyingSocket?.readyState;
}

But even in that case, this code:

} catch (e) {
      if (e instanceof Error) {
        console.error(`${unsubscribeMethod} error:`, e.message);
      }

      // TODO: Maybe add an 'errored' state or a retry limit?
      this._setSubscription(hash, {...subscription, state: 'subscribed'});    

      console.log(`${`-----> hash: ${aux}, ws.connected: ${this._rpcWebSocketConnected}, ` +
        `ws.readyState: ${this._rpcWebSocket.readyState()}, ` +
        `isActive: ${isCurrentConnectionStillActive()}`);    

      await this._updateSubscriptions();
}

prints: true, undefined, true