orientechnologies / orientdb-gremlin

TinkerPop3 Graph Structure Implementation for OrientDB
Apache License 2.0
92 stars 32 forks source link

java.lang.IllegalArgumentException: rawElement must not be null! #171

Closed jbreiding closed 4 years ago

jbreiding commented 4 years ago

I need a little bit of help narrowing down how to prevent this error.

Let me first bring up this is non-deterministic and I am unable to reproduce locally, however I have a deployed environment we I know I will happen a few times a day. So what I am looking for is the right kind of knobs to turn to collect the right debugging information.

I am using kotlin as the dev language. This is running against 3.0.29 for orientdb-gremlin library and docker tag is 3.0.29-tp3. And tinkerpop gremlin-driver is 3.4.4.

Running on azulu-11 for runtime.

java.lang.IllegalArgumentException: rawElement must not be null! at org.apache.tinkerpop.gremlin.orientdb.OrientElement.<init>(OrientElement.java:24) at org.apache.tinkerpop.gremlin.orientdb.OrientVertex.<init>(OrientVertex.java:26) at org.apache.tinkerpop.gremlin.orientdb.OElementFactory.wrapVertex(OElementFactory.java:22) at org.apache.tinkerpop.gremlin.orientdb.OrientVertex.lambda$vertices$1(OrientVertex.java:46) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812) at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169) at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300) at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:44) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50) at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.hasNext(ExpandableStepIterator.java:42) at org.apache.tinkerpop.gremlin.process.traversal.step.util.CollectingBarrierStep.processNextStart(CollectingBarrierStep.java:107) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50) at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143) at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:197) at kotlin.sequences.TransformingSequence$iterator$1.hasNext(Sequences.kt:176) at kotlin.sequences.FilteringSequence$iterator$1.calcNext(Sequences.kt:131) at kotlin.sequences.FilteringSequence$iterator$1.hasNext(Sequences.kt:156) at kotlin.sequences.SequencesKt___SequencesKt.toCollection(_Sequences.kt:702) at kotlin.sequences.SequencesKt___SequencesKt.toMutableList(_Sequences.kt:732) at kotlin.sequences.SequencesKt___SequencesKt.toList(_Sequences.kt:723)

wolf4ood commented 4 years ago

Hi @jbreiding

do you have the traversal that throw this error?

Thanks

jbreiding commented 4 years ago

Sort of, the trouble is this doesn't hit consistently on the same point in the traversal every time.

What specifics are looking for? I can see if I can mock one out, as at time this could contain where steps and sometimes they do not.

Hi @jbreiding

do you have the traversal that throw this error?

Thanks

jbreiding commented 4 years ago

For the most recent error from telemetry the traversal looks similar to this:

g.V({rid}).in({edgelabel}).out({edgelabel}).where(__.has("id", P.neq("")).range(0, 50)

The results are processed in a kotlin sequence as follows: .asSequence().mapNotNull { it.toModel<{targetClass}>() }.toList()

Ultimately running this same query a second, which we have built in results in an error free execution.

wolf4ood commented 4 years ago

@jbreiding

just a question how did you configure the g?

jbreiding commented 4 years ago

I believe every configuration knob is a default. The graph for this traversal is acquired from a vertex object, vertexObject.graph();

That vertex object is from another another query, and graph that graph/g was from an OrientGraphFactory pool with a maxsize of 500.

Currently the workload is spread across two api servers communicating to a single orientdb instance deployed as a stateful set in a k8s cluster.

Is that what you were looking for? Connections are acquired they are from noTx()

wolf4ood commented 4 years ago

@jbreiding

reading the code one possible cause is that the edge contains in or from pointer to vertex that is null or point to a deleted vertex.

You can check this by querying directly the edge class looking for out/in field which are null.

jbreiding commented 4 years ago

Interesting, I could query for edges across my db instance for this condition to confirm if it could be the case correct?

jbreiding commented 4 years ago

select from E where in is null or out is null This returns an empty set, is that the direction you were suggesting?

I would like to clarify something, the same query gets retried when an error like this occurs from our user interface. That request succeeds, I have done some digging to see if it was data related, but given that behavior I would expect I could run the same query and reproduce the issue locally. This not something I have been able to do.

jbreiding commented 4 years ago

I also see a similar error being logged in my telemetry, I have confirmed this edge is populated, the in and out are not empty.

21:48:37.512 [reactor-http-epoll-1] WARN c.o.o.core.record.impl.OEdgeIterator - Record (#24:556) is null

at com.orientechnologies.common.log.OLogManager.log(OLogManager.java:188) at com.orientechnologies.common.log.OLogManager.warn(OLogManager.java:229) at com.orientechnologies.orient.core.record.impl.OEdgeIterator.createGraphElement(OEdgeIterator.java:73) at com.orientechnologies.orient.core.record.impl.OEdgeIterator.createGraphElement(OEdgeIterator.java:36) at com.orientechnologies.orient.core.iterator.OLazyWrapperIterator.hasNext(OLazyWrapperIterator.java:93) at com.orientechnologies.common.collection.OMultiCollectionIterator.hasNextInternal(OMultiCollectionIterator.java:97) at com.orientechnologies.common.collection.OMultiCollectionIterator.hasNext(OMultiCollectionIterator.java:78) at com.orientechnologies.orient.core.record.impl.OEdgeToVertexIterator.hasNext(OEdgeToVertexIterator.java:25)

wolf4ood commented 4 years ago

@jbreiding

try this, which will also detect if some link is invalid

select from E where out.@class is null or in.@class is null
jbreiding commented 4 years ago

Empty result set as well.

jbreiding commented 4 years ago

I spent a bit of time experimenting. There is very clear a bit of a data issue in our implementation. There is a bit of wonkiness in how I was sharing connections across http requests.

I have cleared that up in favor of sharing just the pool/factory. I am still seeing some wonkiness and I am digging further. But it is definitely looking exactly like an underlying data problem. Thanks for your help.