Closed MarcelPitch closed 7 years ago
@laa RIDBag with so many edges should be a Tree RIDBag where lookup/remove should be log(n), not constant, right?
yes, but one single item, inside a tree, but removal in graph db should update both ends of relationship which means that removal iterates through all 8M vertexes from other side of relation. So it means not logarithmic but linear complexity.
Any way it is out of the scope of 2.0 for sure I remove milestone.
@laa can we add a return to .remove()
? In this was we'd have the edge to traverse and updated the other node. WDYT?
@lvca sorry I did not understand. do you propose to leave some vertexes on oposite side of relation unmodified. I mean our relation looks like v1
I would propose just to ignore tombstone during traverse and do background clean up for dead rids. Thanks Good we do not reuse rids. But may be I do not understand your idea.
Absolutely not. This is my idea in pseudo-code:
ORidBag collOut = v1.getEdgeCollection('Friend');
OIdentifiable v2 = collOut.remove( edge );
ORidBag collIn getVertex( v2 ).getEdgeCollection('Friend');
collIn.remove( edge );
So the key is the line 2, where the remove returns the removed element. Can we support it?
We do this, but we have million of such vertexes on other side because we remove current vertex, not edge, so we should remove links from all vertexes on other side.
Your case is edge removal not vertex removal.
But issue is with command DELETE VERTEX Label WHERE @rid = #XX:XXXX
WDYT ?
Here is edge#remove method.
final String outFieldName = OrientVertex.getConnectionFieldName(Direction.OUT, edgeClassName, useVertexFieldsForEdgeLabels);
final boolean outVertexChanged = dropEdgeFromVertex(inVertexEdge, outVertex, outFieldName, outVertex.field(outFieldName));
// IN VERTEX
final OIdentifiable outVertexEdge = vOut != null ? vOut : rawElement;
final ODocument inVertex = getInVertex().getRecord();
final String inFieldName = OrientVertex.getConnectionFieldName(Direction.IN, edgeClassName, useVertexFieldsForEdgeLabels);
final boolean inVertexChanged = dropEdgeFromVertex(outVertexEdge, inVertex, inFieldName, inVertex.field(inFieldName));
so I do not expect problems here
@laa if ORIDBag doesn't provide a returning type for remove()
method, this would be always slower because a full scan of the RIDBag must be done to get the opposite vertex.
About the original use case the operation is quite expensive because it:
So this operation is expensive, but by providing a fix in remove() could be faster.
Furthermore we could support an asynchronous mode where deletion is split in 2 operations:
Then create an asynchronous task that does:
The good is that all the Graph functions ignore NULL references, so the graph is coherent. The only problem would be the count of the connected edges/vertices that could be not precise right after synchronous deletion.
WDYT?
@lvca We have already discussed this problem. Anyway lets provide full detail here to do not forget it any more.
Anyway I do not think that we should put this feature in 2.0 because it will make it unstable. WDYT ?
Hi Andrey, forget (2) now. About 1 this is the flow:
OrientVertex.remove(); -> OrientVertex.removeEdges()
Here it's a RIDBag and not target vertex is provided so this piece of code is execute:
// DELETE ALL THE EDGES
for (Iterator<OIdentifiable> it = bag.rawIterator(); it.hasNext();) {
final OIdentifiable edge = it.next();
if (iAlsoInverse)
removeInverseEdge(iVertex, iFieldName, null, edge, useVertexFieldsForEdgeLabels);
deleteEdgeIfAny(edge);
}
Now removeInverseEdge() call receives the "edge" param, so it calls removeEdges() again (for the inverse relationship), but this time passing the iVertexToRemove, so the code called is:
if (iVertexToRemove != null) {
// SEARCH SEQUENTIALLY (SLOWER)
boolean found = false;
for (Iterator<OIdentifiable> it = bag.rawIterator(); it.hasNext();) {
final ODocument curr = it.next().getRecord();
if (iVertexToRemove.equals(curr)) {
// FOUND AS VERTEX
it.remove();
if (iAlsoInverse)
removeInverseEdge(iVertex, iFieldName, iVertexToRemove, curr, useVertexFieldsForEdgeLabels);
found = true;
break;
} else if (curr.getSchemaClass().isSubClassOf(OrientEdgeType.CLASS_NAME)) {
final Direction direction = getConnectionDirection(iFieldName, useVertexFieldsForEdgeLabels);
// EDGE, REMOVE THE EDGE
if (iVertexToRemove.equals(OrientEdge.getConnection(curr, direction.opposite()))) {
it.remove();
if (iAlsoInverse)
removeInverseEdge(iVertex, iFieldName, iVertexToRemove, curr, useVertexFieldsForEdgeLabels);
found = true;
break;
}
}
}
if (!found)
OLogManager.instance()
.warn(null, "[OrientVertex.removeEdges] edge %s not found in field %s", iVertexToRemove, iFieldName);
deleteEdgeIfAny(iVertexToRemove);
}
As you can see here, it does a full scan to look for the original vertex to delete only those edges. Well, we could change this code by calling:
bag.remove( iVertexToRemove );
// FIND THE EDGE WITH IN/OUT EQUALS TO iVertexToRemove
edge = ???
bag.remove( edge );
if (iAlsoInverse)
removeInverseEdge(iVertex, iFieldName, iVertexToRemove, curr, useVertexFieldsForEdgeLabels);
So by calling 2 times remove() we are sure the edge is removed, either if it's a regular edge, or lightweight.
WDYT?
@lvca I will fix few issue and will be back soon. Need to do mediation on code.
@MarcelPitch, I have the same problem when i delete a node with about 5 million edges. To bypass this problem, i have created a javascript server side function that delete all the edges on the target vertex, then, that delete the vertex.
Here the code (source is my rid param):
var g = orient.getGraph();
var query1 = g.command("sql", "DELETE EDGE FROM (SELECT expand(in()) FROM #" + source + ") TO #" + source);
var query2 = g.command("sql", "DELETE EDGE FROM #" + source + " TO (SELECT expand(out()) FROM #" + source + ")");
var query3 = g.command("sql", "DELETE VERTEX #" + source);
I hope it helps you ;)
Thank you, @Ndrou !
Now, the operation takes less than a second.
Assigned to @tglman we have already discussed this issue with him
Hi All,
Here's the context. Our graph is composed of :
The schema looks like that :
v.Item
-----e.HAS_LABEL
----->v.Label
-----e.HAS_CULTURE
----->v.Culture
One
Culture
vertex has about 4M incomingHAS_CULTURE
edges fromLabel
vertices.The problem is that a
DELETE VERTEX Label WHERE @rid = #XX:XXXX
request, for a Label connected on this dense vertex, takes about 30s to execute. But the same request takes about 1s on aLabel
vertex, which is connected on anotherCulture
vertex with only 2 000 incomingHAS_CULTURE
edges.Could it be the reason of the slownless ?
If so, is there a way to ignore the amount of edges of the
Culture
dense vertex ?Or do we have to create intermediate meta-nodes between
Label
andCulture
vertices ?Thank you in advance and have a good evening.