opencadc / caom2db

Common Archive Observation Model - database implementation
GNU Affero General Public License v3.0
4 stars 10 forks source link

icewind: fails if remote service is non-responsive #283

Open pdowler opened 1 year ago

pdowler commented 1 year ago
2023-08-08 02:04:40.203[main] INFO  ObservationHarvester - harvest window: 2023-08-04 13:41:51.808 :: 2023-08-08 01:59:40.203 [100]
2023-08-08 02:04:44.215[main] FATAL Main - Unexpected failure
java.lang.RuntimeException: failed to get observation list
        at ca.nrc.cadc.caom2.repo.client.RepoClient.readObservationStateList(RepoClient.java:486) ~[caom2-repo-1.4.4.jar:?]
        at ca.nrc.cadc.caom2.repo.client.RepoClient.getObservationList(RepoClient.java:283) ~[caom2-repo-1.4.4.jar:?]
        at ca.nrc.cadc.caom2.repo.client.RepoClient.getList(RepoClient.java:293) ~[caom2-repo-1.4.4.jar:?]
        at org.opencadc.icewind.ObservationHarvester.doit(ObservationHarvester.java:288) ~[icewind.jar:?]
        at org.opencadc.icewind.ObservationHarvester.run(ObservationHarvester.java:193) ~[icewind.jar:?]
        at org.opencadc.icewind.CaomHarvester.run(CaomHarvester.java:195) ~[icewind.jar:?]
        at ca.nrc.cadc.auth.RunnableAction.run(RunnableAction.java:96) ~[cadc-util-1.9.6.jar:?]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
        at javax.security.auth.Subject.doAs(Subject.java:361) ~[?:?]
        at org.opencadc.icewind.Main.main(Main.java:224) ~[icewind.jar:?]
Caused by: java.net.SocketException: Connection reset

The same failure seen by the DeletionHarvester is logged but ignored. That's probably OK.

Expected behaviour: icewind should sleep and retry, with the sleep growing in length up to the max if failures are repeated.

Expected fixes: