vorburger / opendaylight-etcd

etcd as OpenDaylight (ODL) DataBroker database store
Eclipse Public License 1.0
6 stars 2 forks source link

Caused by: io.etcd.jetcd.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request #8

Open vorburger opened 5 years ago

vorburger commented 5 years ago

@bertrandlow FYI as I'm making progress re. the broken build (re. #5) I'm finding a new problem:

opendaylight-user@root>diag
MD SAL Restconf Connector (254)
-------------------------------
Status: Failure
Blueprint
6/2/19 7:30 PM
Exception: 
org.osgi.service.blueprint.container.ComponentDefinitionException: Unable to initialize bean schemaContextHandler
org.osgi.service.blueprint.container.ComponentDefinitionException: org.osgi.service.blueprint.container.ComponentDefinitionException: Unable to initialize bean schemaContextHandler
    at org.apache.aries.blueprint.container.ServiceRecipe.createService(ServiceRecipe.java:310)
    at org.apache.aries.blueprint.container.ServiceRecipe.internalGetService(ServiceRecipe.java:252)
    at org.apache.aries.blueprint.container.ServiceRecipe.internalCreate(ServiceRecipe.java:149)
    at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:90)
    at org.apache.aries.blueprint.container.BlueprintRepository.createInstances(BlueprintRepository.java:360)
    at org.apache.aries.blueprint.container.BlueprintRepository.createAll(BlueprintRepository.java:190)
    at org.apache.aries.blueprint.container.BlueprintContainerImpl.instantiateEagerComponents(BlueprintContainerImpl.java:717)
    at org.apache.aries.blueprint.container.BlueprintContainerImpl.doRun(BlueprintContainerImpl.java:413)
    at org.apache.aries.blueprint.container.BlueprintContainerImpl.run(BlueprintContainerImpl.java:278)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.aries.blueprint.container.ExecutorServiceWrapper.run(ExecutorServiceWrapper.java:106)
    at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:45)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.osgi.service.blueprint.container.ComponentDefinitionException: Unable to initialize bean schemaContextHandler
    at org.apache.aries.blueprint.container.BeanRecipe.runBeanProcInit(BeanRecipe.java:593)
    at org.apache.aries.blueprint.container.BeanRecipe.internalCreate2(BeanRecipe.java:703)
    at org.apache.aries.blueprint.container.BeanRecipe.internalCreate(BeanRecipe.java:666)
    at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:90)
    at org.apache.aries.blueprint.di.RefRecipe.internalCreate(RefRecipe.java:62)
    at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:108)
    at org.apache.aries.blueprint.container.BeanRecipe.getInstance(BeanRecipe.java:268)
    at org.apache.aries.blueprint.container.BeanRecipe.internalCreate2(BeanRecipe.java:685)
    at org.apache.aries.blueprint.container.BeanRecipe.internalCreate(BeanRecipe.java:666)
    at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:90)
    at org.apache.aries.blueprint.di.RefRecipe.internalCreate(RefRecipe.java:62)
    at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:108)
    at org.apache.aries.blueprint.container.BeanRecipe.getInstance(BeanRecipe.java:268)
    at org.apache.aries.blueprint.container.BeanRecipe.internalCreate2(BeanRecipe.java:685)
    at org.apache.aries.blueprint.container.BeanRecipe.internalCreate(BeanRecipe.java:666)
    at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:90)
    at org.apache.aries.blueprint.di.RefRecipe.internalCreate(RefRecipe.java:62)
    at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:108)
    at org.apache.aries.blueprint.container.ServiceRecipe.createService(ServiceRecipe.java:285)
    ... 21 more
Caused by: java.lang.ClassCastException: Cannot cast java.lang.RuntimeException to org.opendaylight.mdsal.common.api.TransactionCommitFailedException
    at java.lang.Class.cast(Class.java:3369)
    at com.google.common.base.Throwables.getCauseAs(Throwables.java:336)
    at org.opendaylight.restconf.nb.rfc8040.handlers.SchemaContextHandler.putData(SchemaContextHandler.java:119)
    at org.opendaylight.restconf.nb.rfc8040.handlers.SchemaContextHandler.onGlobalContextUpdated(SchemaContextHandler.java:92)
    at java.util.Optional.ifPresent(Optional.java:159)
    at org.opendaylight.mdsal.dom.broker.schema.ScanningSchemaServiceProvider.registerSchemaContextListener(ScanningSchemaServiceProvider.java:116)
    at Proxy27842ffe_ed50_4893_9697_6fc656d05d64.registerSchemaContextListener(Unknown Source)
    at Proxy97536d03_d772_4baf_a30d_fd54fdffd6a1.registerSchemaContextListener(Unknown Source)
    at org.opendaylight.restconf.nb.rfc8040.handlers.SchemaContextHandler.init(SchemaContextHandler.java:70)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.aries.blueprint.utils.ReflectionUtils.invoke(ReflectionUtils.java:337)
    at org.apache.aries.blueprint.container.BeanRecipe.invoke(BeanRecipe.java:835)
    at org.apache.aries.blueprint.container.BeanRecipe.runBeanProcInit(BeanRecipe.java:591)
    ... 45 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: io.etcd.jetcd.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:526)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:507)
    at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
    at org.opendaylight.restconf.nb.rfc8040.handlers.SchemaContextHandler.putData(SchemaContextHandler.java:115)
    ... 58 more
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: io.etcd.jetcd.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at org.opendaylight.etcd.ds.impl.EtcdDataStore.commit(EtcdDataStore.java:231)
    at org.opendaylight.etcd.ds.inmemory.copypaste.InMemoryDOMStoreThreePhaseCommitCohort.commit(InMemoryDOMStoreThreePhaseCommitCohort.java:122)
    at org.opendaylight.etcd.ds.inmemory.copypaste.ChainedTransactionCommitImpl.commit(ChainedTransactionCommitImpl.java:30)
    at org.opendaylight.mdsal.dom.broker.CommitCoordinationTask.commitAll(CommitCoordinationTask.java:197)
    at org.opendaylight.mdsal.dom.broker.CommitCoordinationTask.commitBlocking(CommitCoordinationTask.java:172)
    at org.opendaylight.mdsal.dom.broker.CommitCoordinationTask.call(CommitCoordinationTask.java:65)
    at org.opendaylight.mdsal.dom.broker.CommitCoordinationTask.call(CommitCoordinationTask.java:29)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    ... 3 more
Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: io.etcd.jetcd.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
    at org.opendaylight.etcd.ds.impl.EtcdDataStore.commit(EtcdDataStore.java:228)
    ... 12 more
Caused by: java.util.concurrent.ExecutionException: io.etcd.jetcd.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at io.etcd.jetcd.shaded.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
    at io.etcd.jetcd.shaded.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
    at io.etcd.jetcd.Util.lambda$toCompletableFutureWithRetry$2(Util.java:140)
    ... 3 more
Caused by: io.etcd.jetcd.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at io.etcd.jetcd.shaded.io.grpc.Status.asRuntimeException(Status.java:530)
    at io.etcd.jetcd.shaded.io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482)
    at io.etcd.jetcd.shaded.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.etcd.jetcd.shaded.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.etcd.jetcd.shaded.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.etcd.jetcd.ClientConnectionManager$AuthTokenInterceptor$1$1.onClose(ClientConnectionManager.java:302)
    at io.etcd.jetcd.shaded.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.etcd.jetcd.shaded.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.etcd.jetcd.shaded.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.etcd.jetcd.shaded.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:694)
    at io.etcd.jetcd.shaded.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.etcd.jetcd.shaded.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.etcd.jetcd.shaded.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.etcd.jetcd.shaded.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
    at io.etcd.jetcd.shaded.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
    at io.etcd.jetcd.shaded.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
    at io.etcd.jetcd.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
    at io.etcd.jetcd.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
    at io.etcd.jetcd.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
    at io.etcd.jetcd.shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    at io.etcd.jetcd.shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
    ... 3 more
vorburger commented 5 years ago

There are two problems here:

  1. Caused by: java.lang.ClassCastException: Cannot cast java.lang.RuntimeException to org.opendaylight.mdsal.common.api.TransactionCommitFailedException ... this is because of the big TODOs in org.opendaylight.etcd.ds.impl.EtcdDataStore.commit(DataTreeCandidate) re. error handling; it's done wrong, and should be fixed - but that's "small" problem just affecting reporting.

  2. Caused by: io.etcd.jetcd.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request this one is much more interesting ... probably related to https://github.com/etcd-io/etcd/issues/4109 - but I don't understand how we're putting too many operations into a TXN...

I'm also struggling to reproduce this in a simple standalone test case (above is in Karaf).

vorburger commented 5 years ago

I'm also struggling to reproduce this in a simple standalone test case (above is in Karaf)

Oh I think that this is "just" because IRL there are just a lot more YANG models loaded than in my isolated simple test case ... the karaf.log has a lot more of these TXN put than the standalone test:

2019-02-06T21:28:42,669 | INFO | EtcdDB-commitCoordinator-0 | EtcdYangKV | 203 - org.opendaylight.etcd.ds - 0.0.1.SNAPSHOT | -O TXN put: «O·000000010202000D·modules-state·0200·-urn:ietf:params:xml:ns:yang:ietf-yang-library·02000A·2016-06-21» ➠ «·040D»

It DOES reproduce in the opendaylight-etcd/features/odl-etcd-demo-restconf as well.

vorburger commented 5 years ago

According to the documentation, there is a --max-txn-ops CLI flag, or env variable ETCD_MAX_TXN_OPS, which default to 128 and could probably be increased...

But the etcd Version: 3.2.16 which I currently have on Fedora 29 doesn't have that flag yet, so I'll need to custom build a more recent one from source. And then the Maven plugin which is used in odl-etcd-demo-restconf would have to support this...

vorburger commented 5 years ago

the Maven plugin which is used in odl-etcd-demo-restconf would have to support this...

https://github.com/etcd-io/jetcd/pull/497

https://github.com/vorburger/opendaylight-etcd/compare/issue-8 has WIP re. this but it still fails :sob:

https://github.com/vorburger/opendaylight-etcd/commit/e8c67078368ea0cb2d3328cfd777c71a03bb3b2e has the details why ...

vorburger commented 5 years ago

https://github.com/vorburger/opendaylight-etcd/compare/issue-8 has WIP

that branch just reverted the "comment out odl-etcd-demo-restconf" commit (in features/features-etcd/pom.xml and features/pom.xml), and then changed the jetcd-launcher-maven-plugin in features/odl-etcd-demo-restconf/pom.xml to be:

    <artifactId>jetcd-launcher-maven-plugin</artifactId>
        <version>0.3.1-SNAPSHOT</version>
        <configuration>
          <additionalArguments>
            <additionalArgument>--max-txn-ops</additionalArgument>
            <additionalArgument>1024</additionalArgument>
          </additionalArguments>
        </configuration>

which requires https://github.com/etcd-io/jetcd/pull/498 (and, in an ideal world, a jetcd 0.3.1 release, so that we don't depend on SNAPSHOT) ... BUT it then causes the old problem mentioned in https://jira.opendaylight.org/browse/AAA-169 to surface again:

Caused by: java.lang.IllegalStateException: There is already a Shiro environment associated with the current ServletContext. Check if you have multiple EnvironmentLoader* definitions in your web.xml!

but unclear to me right now why - somehow AAA is getting init twice?! We solved this old problem, long ago, but it looks like somehow it's back.