real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
https://aeron.io
Apache License 2.0
7.42k stars 892 forks source link

NPE at `ClusterBackupAgent.reset` #1652

Open artem-v opened 3 months ago

artem-v commented 3 months ago

Hi

Aeron 1.44.1

1 observations from 2024-08-22 15:05:16.554+0300 to 2024-08-22 15:05:16.554+0300 for:
io.aeron.cluster.client.ClusterException: WARN - failed to stop log replay
        at io.aeron.cluster.ClusterBackupAgent.reset(ClusterBackupAgent.java:368)
        at io.aeron.cluster.ClusterBackupAgent.resetBackup(ClusterBackupAgent.java:647)
        at io.aeron.cluster.ClusterBackupAgent.doWork(ClusterBackupAgent.java:287)
       ...
Caused by: java.lang.NullPointerException: Cannot invoke "io.aeron.archive.client.AeronArchive.stopReplay(long)" because "this.clusterArchive" is null
        at io.aeron.cluster.ClusterBackupAgent.reset(ClusterBackupAgent.java:364)
        ... 9 more

Pretty regularly receiving such error in the counted error log file. Looks like there're conditions on which clusterArchive is null during .reset() call in the ClusterBackupAgent.

Unfortunately cannot post clear test that reproduce this.