real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
Apache License 2.0
7.37k stars 888 forks source link

NPE at `ClusterBackupAgent.reset` #1652

Open artem-v opened 1 month ago

artem-v commented 1 month ago

Hi

Aeron 1.44.1

1 observations from 2024-08-22 15:05:16.554+0300 to 2024-08-22 15:05:16.554+0300 for:
io.aeron.cluster.client.ClusterException: WARN - failed to stop log replay
        at io.aeron.cluster.ClusterBackupAgent.reset(ClusterBackupAgent.java:368)
        at io.aeron.cluster.ClusterBackupAgent.resetBackup(ClusterBackupAgent.java:647)
        at io.aeron.cluster.ClusterBackupAgent.doWork(ClusterBackupAgent.java:287)
       ...
Caused by: java.lang.NullPointerException: Cannot invoke "io.aeron.archive.client.AeronArchive.stopReplay(long)" because "this.clusterArchive" is null
        at io.aeron.cluster.ClusterBackupAgent.reset(ClusterBackupAgent.java:364)
        ... 9 more

Pretty regularly receiving such error in the counted error log file. Looks like there're conditions on which clusterArchive is null during .reset() call in the ClusterBackupAgent.

Unfortunately cannot post clear test that reproduce this.