Closed aew9tie7Peegh6i closed 2 weeks ago
Type
Description
Failed to push a commit onto develop to move @unreleased changelogs
com.palantir.conjure.java.api.errors.UnknownRemoteException: Response status: 502
at com.palantir.conjure.java.dialogue.serde.ErrorDecoder.decodeInternal(ErrorDecoder.java:123)
at com.palantir.conjure.java.dialogue.serde.ErrorDecoder.decode(ErrorDecoder.java:68)
at com.palantir.conjure.java.dialogue.serde.ConjureBodySerDe$EmptyBodyDeserializer.deserialize(ConjureBodySerDe.java:351)
at com.palantir.conjure.java.dialogue.serde.ConjureBodySerDe$EmptyBodyDeserializer.deserialize(ConjureBodySerDe.java:338)
at com.palantir.conjure.java.client.jaxrs.DialogueFeignClient$RemoteExceptionDecoder.decode(DialogueFeignClient.java:336)
at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:134)
at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76)
at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:103)
at jdk.proxy2/jdk.proxy2.$Proxy83.createTree(Unknown Source)
at jdk.internal.reflect.GeneratedMethodAccessor244.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at com.palantir.github.clients.PerTraceRequestCountingClientFactory.invoke(PerTraceRequestCountingClientFactory.java:35)
at com.palantir.github.clients.PerTraceRequestCountingClientFactory.lambda$client$0(PerTraceRequestCountingClientFactory.java:29)
at jdk.proxy2/jdk.proxy2.$Proxy83.createTree(Unknown Source)
at jdk.internal.reflect.GeneratedMethodAccessor244.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at com.palantir.github.clients.ErrorTrackingExternalClientFactory.invoke(ErrorTrackingExternalClientFactory.java:36)
at com.palantir.github.clients.ErrorTrackingExternalClientFactory.lambda$client$0(ErrorTrackingExternalClientFactory.java:30)
at jdk.proxy2/jdk.proxy2.$Proxy83.createTree(Unknown Source)
at com.palantir.autorelease.CommitFactory.createCommit(CommitFactory.java:100)
at com.palantir.autorelease.CommitFactory.createCommit(CommitFactory.java:70)
at com.palantir.autorelease.ReleaserHelper.moveChangelogs(ReleaserHelper.java:64)
at com.palantir.autorelease.Releaser.labelRelease(Releaser.java:229)
at com.palantir.autorelease.DefaultRepositoryArchetype.automatedRelease(DefaultRepositoryArchetype.java:117)
at com.palantir.autorelease.Repository.automatedRelease(Repository.java:63)
at com.palantir.autorelease.label.DefaultWebhookHandler.handlePullRequestClosedEvent(DefaultWebhookHandler.java:224)
at com.palantir.autorelease.label.DefaultWebhookHandler.handleWebhook(DefaultWebhookHandler.java:128)
at com.palantir.github.GithubWebhookResource.lambda$receiveWebhook$0(GithubWebhookResource.java:58)
at com.palantir.tracing.Tracers$TracingAwareRunnable.run(Tracers.java:584)
at com.palantir.tritium.metrics.TaggedMetricsExecutorService$TaggedMetricsRunnable.run(TaggedMetricsExecutorService.java:142)
at com.palantir.nylon.threads.RenamingExecutorService$RenamingRunnable.run(RenamingExecutorService.java:92)
at org.jboss.threads.EnhancedViewExecutor$EnhancedViewExecutorRunnable.run(EnhancedViewExecutor.java:519)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
at com.palantir.tritium.metrics.TaggedMetricsThreadFactory$InstrumentedTask.run(TaggedMetricsThreadFactory.java:69)
at java.base/java.lang.Thread.run(Thread.java:840)
Suppressed: com.palantir.conjure.java.dialogue.serde.ErrorDecoder$ResponseDiagnostic: Response Diagnostic Information: {status=502, Server=GitHub.com, Content-Type=application/json, Content-Length=32, Date=Thu, 13 Jun 2024 15:50:01 GMT}
Suppressed: com.palantir.logsafe.exceptions.SafeRuntimeException: unknown error: {serviceName=github, errorBody={
"message": "Server Error"
}
}
at com.palantir.github.clients.ErrorLoggingErrorTracker.trackUnknownRemoteError(ErrorLoggingErrorTracker.java:23)
at com.palantir.github.clients.ErrorTracker.lambda$andAlso$0(ErrorTracker.java:14)
at com.palantir.github.clients.ErrorTrackingExternalClientFactory.invoke(ErrorTrackingExternalClientFactory.java:41)
... 20 more
General
Before this PR: Even with cached table lookups disabled, deleting and creating tables across KVS clients would break.
After this PR:
==COMMIT_MSG== There are some situations that can result in cached table lookups returning the wrong table for a table which has been dropped and re-created.
There are different scenarios where the caching breaks down, see the tests in TableRemappingKeyValueServiceTest for details.
This change is the first in what may become a series to resolve the concurrency issues within KvTableMappingService.
Unfortunately this change isn't actually sufficient to solve the table caching issues, however the feature to disable the table cache added in https://github.com/palantir/atlasdb/pull/6946 should allow for correctness where we expect namespaced tables to be created and deleted at a reasonably high scale.
The long-term fix for this problem is likely to either remove the cache from KvTableMappingService completely, or to re-write KvTableMappingService to be multi-node safe. As it stands, KvTableMappingService is NOT concurrency safe. ==COMMIT_MSG==
Priority: P1
Concerns / possible downsides (what feedback would you like?): This does not resolve everything related to caching in KvTableMappingService
Is documentation needed?: No
Testing and Correctness
What was existing testing like? What have you done to improve it?: 3 new tests, 2 of which are based on ErikH's initial investigations.
Execution
How would I tell this PR works in production? (Metrics, logs, etc.): Set the table caching predicate with this patch, re-run customer workflows which currently error.
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?: Completely remove caching from KvTableMappingService and always read short -> long name lookups from the DB
Development Process
Please tag any other people who should be aware of this PR: @jeremyk-91 @sverma30 @raiju