Loading millions of edges at once on a single vertex causes timeouts in Cassandra

When loading a million edges which are all incident on the same vertex into a TitanGraph backed by a Cassandra cluster with more than one node, a Cassandra internal timeout occurs which aborts the loading process. This behavior is specific to having such a "supernode" with a lot of incident edges and loading all of these edges at once. Also, whether or not this behavior is observed depends on the hardware. For some systems, increasing the RPC timeout parameter solves the issue on others it does not.

This limitation also applies to dense index entries, i.e. if one is loading millions of properties with the same indexed value, then that creates a dense list of entries under that index entry. In these cases, failure in the storage backend may occur.

Hello Matthias,

I think I've observed this : I currently trying to code a logs collector in order to inject event in a Titan Graph backed by Cassandra (1.8). For one vertex (I.E. on server or one firewall equipment) we can add millions event (especially on firewalls).

For initials tests, I've try to parse and insert some log files (about 5 millions events by file). Each log file concern just one equipment, So, loading a file consist to create a "Server" edge (if not exist) and link it "Event" edges (millions of them) in one transaction.

After events loading loop, I call : g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS); g.shutdown();

and obtain :

120883 [main] DEBUG com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Saving transaction. Added 6000035, removed 0 127036 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 127689 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 135757 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe .... 144237 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Connection reset ...

145648 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 146081 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 146563 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 146998 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 147586 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 148008 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe ... etc

and g.stopTransaction() seem never finish.

I have searching for an elegant way to save transactions by block without stoping it without success (a kind of g.saveTransaction() ).

Any idea ?

Hey,

I am not sure exactly what you mean by g.saveTransaction()? I think the best way is to divide your transaction into smaller chunks. I.e. load only a few thousand edges per transaction. That way, you are much less likely to encounter the kind of timeout and buffer exceptions that you are getting.

HTH, Matthias

On Thu, Feb 21, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Hello Matthias,

I think I've observed this : I currently trying to code a logs collector in order to inject event in a Titan Graph backed by Cassandra (1.8). For one vertex (I.E. on server or une firewall equipment) we can add millions event (especially on firewalls).

For initials tests, I've try to parse and insert some log files (about 5 millions events by file). Each log file concern just one equipment, so loading a file consist to create "Server" edge (if not exist) and link "Event" edges (millions of them) a one transaction.

After events loading loop, I call : g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS); g.shutdown();

and obtain :

120883 [main] DEBUG com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Saving transaction. Added 6000035, removed 0 127036 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 127689 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 135757 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe .... 144237 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Connection reset ...

145648 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 146081 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 146563 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 146998 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 147586 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe 148008 [main] DEBUG com.netflix.astyanax.thrift.ThriftConverter - java.net.SocketException: Broken pipe ... etc

and g.stopTransaction() seem never finish.

I have searching for an elegant way to save transactions by block without stoping it without success (a kind of g.saveTransaction() ).

Any idea ?

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13904023.

Matthias Broecheler http://www.matthiasb.com

"I am not sure exactly what you mean by g.saveTransaction()? " My understanding of Cassandra backend implementation (only based on network traffic observation during transaction between client and cassandra servers) is :

When we add Vertex or Edge on TitanGraph object, they build the graph on memory,
The TitanGraph object only check/reserve IDs availability on Keyspace,
On call of .stopTransaction() this memory mapped graph is flushed on Cassandra Keyspace.

This may be a misunderstanding of my part, but if not, it could be interesting to allow developer to decide when datas must be "flushed" to Cassandra Keyspace in order to avoid this kind of buffer timeout.

Feel free to correct me my vision of Cassandra's TitanGraph implementation is incorrect.

P.S. I use "storage.batch-loading" option

Yes, your understanding of how Titan operates is correct. However, only the modifications of the memory mapped graph get flushed back to Cassandra.

Wouldn't the developer always want to flush all data? Otherwise they would loose changes in their transaction.

On Thu, Feb 21, 2013 at 1:45 PM, protheusfr notifications@github.comwrote:

"I am not sure exactly what you mean by g.saveTransaction()? " My understanding of Cassandra backend implementation (only based on network traffic during transaction between client and cassandra servers) is :

When we add Vertex or Edge on TitanGraph object, they build the graph on memory,

The TitanGraph object only check/reserve IDs availability on Keyspace,

On call of .stopTransaction() this memory mapped graph is flushed on Cassandra Keyspace.

This may be a misunderstanding of my part, but if not, it could be interesting to allow developer to decide when datas must be "flushed" to Cassandra Keyspace in order to avoid this kind of buffer timeout.

Feel free to correct me my vision of Cassandra's TitanGraph implementation is incorrect.

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13914852.

Matthias Broecheler http://www.matthiasb.com

Yes they would be flush, I've solve this pb with g.saveTransaction(), g.startTransaction() each 100 000 "Event" edge inserts during loading process. So, now, my situation is :

I have one Vertex (my supernode) with about 1.2 millions connected Vertex,
This is backed by Cassandra with 7 nodes (dispatched on two geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO impl.ConnectionPoolMBeanManager: Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool ==>titangraph[cassandra:cnode03.prod.xxxxx.com] gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ? Gremlin doesn't seem log anything.

Meaning, you get no additional output? It just hangs forever?

On Fri, Feb 22, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Yes they would be flush, I've solve this pb with g.saveTransaction(), g.startTransaction() each 100 000 "Event" edge inserts during loading process. So, now, my situation is :

I have one Vertex (my supernode) with about 1.2 millions connected Vertex,

This is backed by Cassandra with 7 nodes (dispatched on two geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO impl.ConnectionPoolMBeanManager: Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool ==>titangraph[cassandra:cnode03.prod.xxxxx.com] gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com ').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ? Gremlin doesn't seem log anything.

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13960552.

Matthias Broecheler http://www.matthiasb.com

Absolutely, no additional log on gremlin client and no relevant log on Cassandra servers.

Le 25 févr. 2013 à 11:14, Matthias Broecheler notifications@github.com a écrit :

Meaning, you get no additional output? It just hangs forever?

On Fri, Feb 22, 2013 at 10:15 AM, protheusfr notifications@github.comwrote:

Yes they would be flush, I've solve this pb with g.saveTransaction(), g.startTransaction() each 100 000 "Event" edge inserts during loading process. So, now, my situation is :

I have one Vertex (my supernode) with about 1.2 millions connected Vertex,

This is backed by Cassandra with 7 nodes (dispatched on two geographically different sites and using RackInferringSnitch)

I try to request this graph db through Gremlin :

g = TitanFactory.open("./cassandra.conf"); 13/02/22 19:10:06 INFO impl.ConnectionPoolMBeanManager: Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool ==>titangraph[cassandra:cnode03.prod.xxxxx.com] gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com ').outE('srx_deny').interval('time',1360798508000,1360798509000).inV().message

and... I think I meet the indicated RPC Timeout , but how we can verify ? Gremlin doesn't seem log anything.

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-13960552.

Matthias Broecheler http://www.matthiasb.com — Reply to this email directly or view it on GitHub.

I've take a look to network traffic between Gremlin client and Cassandra servers. It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......10.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2 16:13:51.779754 IP cnode03.prod.dc1xxxxxx.com.netlock1 > 10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options [nop,nop,TS val 907272194 ecr 430479826], length 821 E..iv.@.?.A.

k. ...#.....AzN.4Y...r....... 6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117... 16:13:51.779813 IP 10.11.1.242.62103 > cnode03.prod.dc1.xxxxxxxx.com.netlock1: Flags [.], ack 18356, win 8050, options [nop,nop,TS val 430479829 ecr 907272194], length 0 E..4.c@.@..X ...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxxx.com.netlock1 > 10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options [nop,nop,TS val 1162997618 ecr 429222099], length 30 E..R..@.=... .e. ...#.... ...Q ....r&...... EQ.r..h............ get_slice.........

Can you run simple queries or do all queries time out? Try changing to cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and Cassandra servers. It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1 0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2 16:13:51.779754 IP cnode03.prod.dc1.xgs-france.com.netlock1 > 10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options [nop,nop,TS val 907272194 ecr 430479826], length 821 E..iv.@.?.A.

k. ...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117... 16:13:51.779813 IP 10.11.1.242.62103 > cnode03.prod.dc1.xgs-france.com.netlock1: Flags [.], ack 18356, win 8050, options [nop,nop,TS val 430479829 ecr 907272194], length 0 E..4.c@.@..X ...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xgs-france.com.netlock1 > 10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options [nop,nop,TS val 1162997618 ecr 429222099], length 30 E..R..@.=... .e. ...#.... ...Q ....r&...... EQ.r..h............ get_slice.........

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794.

Matthias Broecheler http://www.matthiasb.com

Hello,

Yes single simple query work fine :

gremlin> g = TitanFactory.open("./cassandra.distant");
==>titangraph[cassandrathrift:cnode03.xxxxx.com] gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').map
==>{timestamp=1361551746015, IP=/0.0.0.10, name=srx3600.interco.dc1.xxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}

After changing to cassandra thrift : gremlin> g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count() Could not read from storage Display stack trace? [yN] y com.thinkaurelius.titan.core.TitanException: Could not read from storage at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:189) at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.containsVertex(StandardPersistTitanTx.java:78) at com.thinkaurelius.titan.graphdb.types.manager.SimpleTypeManager.getType(SimpleTypeManager.java:137) at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExisting(AbstractTitanTx.java:166) at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExistingVertex(AbstractTitanTx.java:156) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:401) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:324) at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.loadRelations(StandardPersistTitanTx.java:155) at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.ensureLoadedEdges(AbstractTitanVertex.java:88) at com.thinkaurelius.titan.graphdb.vertices.StandardTitanVertex.getRelations(StandardTitanVertex.java:68) at com.thinkaurelius.titan.graphdb.query.SimpleAtomicQuery.edges(SimpleAtomicQuery.java:473) at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.getEdges(AbstractTitanVertex.java:177) at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:46) at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:16) at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:84) at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115) at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:108) at com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1080) at com.tinkerpop.pipes.util.PipesFluentPipeline$count.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112) at groovysh_evaluate.run(groovysh_evaluate:46) at groovysh_evaluate$run.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42) at groovysh_evaluate$run.call(Unknown Source) at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67) at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown Source) at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152) at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114) at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source) at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137) at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66) at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:60) at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:67) at com.thinkaurelius.titan.tinkerpop.gremlin.Console.main(Console.java:72) Caused by: com.thinkaurelius.titan.diskstorage.PermanentStorageException: Permanent failure in storage backend at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:255) at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:169) at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.containsKey(BufferedKeyColumnValueStore.java:31) at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.containsKey(ConsistentKeyLockStore.java:77) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:184) ... 64 more Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:552) at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:536) at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:166) ... 67 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 78 more

I think we can exclude a Cassandra failure, this Cassandra cluster work fine with other applications (keys / values through Hector).

Le 26 févr. 2013 à 08:13, Matthias Broecheler notifications@github.com a écrit :

Can you run simple queries or do all queries time out? Try changing to cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and Cassandra servers. It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1 0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2 16:13:51.779754 IP cnode03.prod.dc1.xxxxxxx.netlock1 > 10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options [nop,nop,TS val 907272194 ecr 430479826], length 821 E..iv.@.?.A.

k. ...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117... 16:13:51.779813 IP 10.11.1.242.62103 > cnode03.prod.dc1.xxxxxx.com.netlock1: Flags [.], ack 18356, win 8050, options [nop,nop,TS val 430479829 ecr 907272194], length 0 E..4.c@.@..X ...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxx.com.netlock1 > 10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options [nop,nop,TS val 1162997618 ecr 429222099], length 30 E..R..@.=... .e. ...#.... ...Q ....r&...... EQ.r..h............ get_slice.........

Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794.

Matthias Broecheler http://www.matthiasb.com Reply to this email directly or view it on GitHub.

Hey, this is strange. It seems to time out on a very simple cassandra operation

checking the existing of a small key slice. Can you reliably execute all other queries but this one always times out? Meaning, is it deterministically reproducible? Thank you, Matthias

On Tue, Feb 26, 2013 at 12:55 AM, protheusfr notifications@github.comwrote:

Hello,

Yes single simple query work fine :

gremlin> g = TitanFactory.open("./cassandra.distant"); ==>titangraph[cassandrathrift:cnode03.xxxxx.com] gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').map ==>{timestamp=1361551746015, IP=/0.0.0.10, name= srx3600.interco.dc1.xxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013}

After changing to cassandra thrift : gremlin> g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()

Could not read from storage Display stack trace? [yN] y com.thinkaurelius.titan.core.TitanException: Could not read from storage at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:189)

at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.containsVertex(StandardPersistTitanTx.java:78)

at com.thinkaurelius.titan.graphdb.types.manager.SimpleTypeManager.getType(SimpleTypeManager.java:137)

at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExisting(AbstractTitanTx.java:166)

at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.getExistingVertex(AbstractTitanTx.java:156)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:401)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.loadRelations(StandardTitanGraph.java:324)

at com.thinkaurelius.titan.graphdb.transaction.StandardPersistTitanTx.loadRelations(StandardPersistTitanTx.java:155)

at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.ensureLoadedEdges(AbstractTitanVertex.java:88)

at com.thinkaurelius.titan.graphdb.vertices.StandardTitanVertex.getRelations(StandardTitanVertex.java:68)

at com.thinkaurelius.titan.graphdb.query.SimpleAtomicQuery.edges(SimpleAtomicQuery.java:473)

at com.thinkaurelius.titan.graphdb.vertices.AbstractTitanVertex.getEdges(AbstractTitanVertex.java:177)

at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:46)

at com.tinkerpop.gremlin.pipes.transform.VerticesEdgesPipe.processNextStart(VerticesEdgesPipe.java:16)

at com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:84) at com.tinkerpop.pipes.util.Pipeline.next(Pipeline.java:115) at com.tinkerpop.pipes.util.PipeHelper.counter(PipeHelper.java:108) at com.tinkerpop.gremlin.java.GremlinPipeline.count(GremlinPipeline.java:1080) at com.tinkerpop.pipes.util.PipesFluentPipeline$count.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)

at groovysh_evaluate.run(groovysh_evaluate:46) at groovysh_evaluate$run.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)

at groovysh_evaluate$run.call(Unknown Source) at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67) at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown Source) at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:152) at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114) at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source) at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:100)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)

at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:137)

at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:57) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1071) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:128)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:148)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:66)

at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:60) at com.thinkaurelius.titan.tinkerpop.gremlin.Console.(Console.java:67) at com.thinkaurelius.titan.tinkerpop.gremlin.Console.main(Console.java:72) Caused by: com.thinkaurelius.titan.diskstorage.PermanentStorageException: Permanent failure in storage backend at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:255)

at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:169)

at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.containsKey(BufferedKeyColumnValueStore.java:31)

at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.containsKey(ConsistentKeyLockStore.java:77)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.containsVertexID(StandardTitanGraph.java:184)

... 64 more Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)

at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)

at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)

at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:552)

at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:536) at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.containsKey(CassandraThriftKeyColumnValueStore.java:166)

... 67 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

... 78 more

I think we can exclude a Cassandra failure, this Cassandra cluster work fine with other applications (keys / values through Hector).

Le 26 févr. 2013 à 08:13, Matthias Broecheler notifications@github.com a écrit :

Can you run simple queries or do all queries time out? Try changing to cassandrathrift as the adapter, see if that helps.

On Mon, Feb 25, 2013 at 7:18 AM, protheusfr notifications@github.comwrote:

I've take a look to network traffic between Gremlin client and Cassandra servers. It seem, after a while, to loop on "describe_ring" command :

describe_ring..................'130386982117462736970068843678458264655......'133105607137018827020446003034043231743............10.10.117.1....10.10.107.1....10.20.101.3............10.10.117.1....10.10.107.1....10.20.101.3...............10.10.117.1.......10.......117........10.10.107.1.......10.......107........10.20.101.3.......20.......101........&45316390387228121104225191820516211791......&48035015406784211154602351176101178879............10.10.117.2....10.10.107.2....10.20.101.1............10.10.117.2....10.10.107.2....10.20.101.1...............10.10.117.2.......10.......117........10.10.107.2.......10.......107........10.20.101.1.......20.......101........&24114189515438479738060920219230794689......&45316390387228121104225191820516211791............10.20.101.2....10.10.117.2....10.10.107.2............10.20.101.2....10.10.117.2....10.10.107.2...............10.20.101.2.......20.......101........10.10.117.2.......10.......117........10.10.107.2.......1

0.......107........&90570311271901519087524177105072205311......'130386982117462736970068843678458264655............10.20.101.1....10.10.117.1....10.10.107.1............10.20.101.1....10.10.117.1....10.10.107.1...............10.20.101.1.......20.......101........10.10.117.1.......10.......117........10.10.107.1.......10.......107........'133105607137018827020446003034043231743......%5499719541666903221680525247130152447............10.10.107.1....10.20.101.3....10.2

16:13:51.779754 IP cnode03.prod.dc1.xxxxxxx.netlock1 > 10.11.1.242.62103: Flags [P.], seq 17535:18356, ack 581, win 114, options [nop,nop,TS val 907272194 ecr 430479826], length 821 E..iv.@.?.A.

k. ...#.....AzN.4Y...r.......

6.......0.101.2............10.10.107.1....10.20.101.3....10.20.101.2...............10.10.107.1.......10.......107........10.20.101.3.......20.......101........10.20.101.2.......20.......101........&48035015406784211154602351176101178879......&90570311271901519087524177105072205311............10.10.107.2....10.20.101.1....10.10.117.1............10.10.107.2....10.20.101.1....10.10.117.1...............10.10.107.2.......10.......107........10.20.101.1.......20.......101........10.10.117.1.......10.......117........%5499719541666903221680525247130152447......&24114189515438479738060920219230794689............10.20.101.3....10.20.101.2....10.10.117.2............10.20.101.3....10.20.101.2....10.10.117.2...............10.20.101.3.......20.......101........10.20.101.2.......20.......101........10.10.117.2.......10.......117...

16:13:51.779813 IP 10.11.1.242.62103 > cnode03.prod.dc1.xxxxxx.com.netlock1: Flags [.], ack 18356, win 8050, options [nop,nop,TS val 430479829 ecr 907272194], length 0 E..4.c@.@..X ...

and "get_slice" command, but without returning anything.

15:52:55.580697 IP cnode02.prod.dc2.xxxxxx.com.netlock1 > 10.11.1.242.62108: Flags [P.], seq 1:31, ack 101, win 114, options [nop,nop,TS val 1162997618 ecr 429222099], length 30 E..R..@.=... .e. ...#.... ...Q ....r&...... EQ.r..h............ get_slice.........

Reply to this email directly or view it on GitHub< https://github.com/thinkaurelius/titan/issues/11#issuecomment-14047794>.

Matthias Broecheler http://www.matthiasb.com Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14101638 .

Matthias Broecheler http://www.matthiasb.com

Yes absolutely, every request like "g.V('name','srx3600.interco.dc1.xxxxxx.com').map" work fine. But when I try to access to a connected vertex of this one (they have more than 1 million connected vertex to this "super-node") this fails systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()" with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work fine (quite long, but provide a response).

Do you have vertex centric indices on these super node vertices? That would allow you to pull out the edges you want quickly. Otherwise, your queries might attempt to read all edges which - at 1 million edges - is likely to time out because its too much data. Also, try to limit the size of the result set using [0..1000] after the outE. Does that work?

On Wed, Feb 27, 2013 at 5:22 AM, protheusfr notifications@github.comwrote:

Yes absolutely, every request like "g.V('name',' srx3600.interco.dc1.xxxxxx.com').map" work fine. But when I try to access to a connected vertex of this one (they have more than 1 million connected vertex to this "super-node") this fails systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()" with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work fine (quite long, but provide a response).

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14173194 .

Matthias Broecheler http://www.matthiasb.com

What do you mean exactly by vertex centric indices ?

I have created some index on vertex and edge in order to avoid full scans :

    if (g.getType("time") == null || (g.getType("time") != null && !g.getType("time").isPropertyKey())) {
        TitanKey time = g.makeType().name("time").dataType(Long.class).functional().indexed().makePropertyKey();
        TitanLabel log = g.makeType().name("log").primaryKey(time).makeEdgeLabel();
        TitanLabel connected = g.makeType().name("connected").primaryKey(time).makeEdgeLabel();
        TitanLabel srx_create = g.makeType().name(SRX_CREATE).primaryKey(time).makeEdgeLabel();
        TitanLabel srx_close = g.makeType().name(SRX_CLOSE).primaryKey(time).makeEdgeLabel();
        TitanLabel srx_deny = g.makeType().name(SRX_DENY).primaryKey(time).makeEdgeLabel();

    }

    // Vertex index creation
    if (g.getType("name") == null || ((g.getType("name") != null) && !g.getType("name").isPropertyKey())) {
        TitanKey name = g.makeType().name("name").dataType(String.class).unique().functional().indexed().makePropertyKey();
    }
    if (g.getType("IP") == null || (g.getType("IP") != null && !g.getType("IP").isPropertyKey())) {
        TitanKey name = g.makeType().name("IP").dataType(String.class).functional().makePropertyKey();
    }
    if (g.getType("action") == null || (g.getType("action") != null && !g.getType("action").isPropertyKey())) {
        TitanKey name = g.makeType().name("action").dataType(String.class).functional().makePropertyKey();
    }

After that , I begin to create initial vertex (the super node) by calling :

public Vertex CreateSrx(String FQDN, InetAddress IP, String OS){
    Vertex v = g.addVertex(null);
    v.setProperty("name", FQDN);
    v.setProperty("date", new Date().toString());
    v.setProperty("timestamp", System.currentTimeMillis());
    v.setProperty("IP",IP.toString());
    return v;
}

And link events to this node through :

public Vertex CreateEvent(Vertex Server, String GivenLevel, Long TimeStamp, String Message, String Action ) throws IllegalArgumentException{
    if(!Level.contains(GivenLevel)) {
        throw new IllegalArgumentException("Given Level is not correct");
    }
    Vertex v = g.addVertex(null);
    v.setProperty("level", GivenLevel);
    v.setProperty("timestamp",TimeStamp);
    v.setProperty("message", Message);
    v.setProperty("action", Action);

    Edge edge = null;
    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, "srx_close");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, "srx_deny");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
    }
    return v;
}

For the test with range limit [0..10000], THIS IS VERY STRANGE :

First call to initial vertex, every think goes well : ==>titangraph[cassandrathrift:cnode03.prod.dc1.xxxxxx.com] gremlin> g.V('name','srx3600.interco.dc1.xxxxx.com').map

First call to edges :

==>{timestamp=1361551746015, IP=/0.0.0.10, name=srx3600.interco.dc1.xxxxxxx.com, date=Fri Feb 22 17:49:06 CET 2013} gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1000] Could not read from storage Display stack trace? [yN] y com.thinkaurelius.titan.core.TitanException: Could not read from storage at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:160) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.readException(StandardTitanGraph.java:155) … (same exception than previously explained)

After a while (about 1 minute), I call again with [0..1] : gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1]
==>e[479:4:36028797018964102][4-srx_deny->128] ==>e[1307:4:36028797018964102][4-srx_deny->404]

Ok… so trying to count : gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..100].count() ==>101

Well, we try with a large range : gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..1000].count() ==>1001

Large again (same range previously throw an exception) : gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..10000].count() ==>10001

:-/ …

Large again : gremlin> g.V('name','srx3600.interco.dc1.xxxxxx.com').outE('srx_deny')[0..100000].count() ==>62624

Do you understand something ?

Le 27 févr. 2013 à 20:08, Matthias Broecheler notifications@github.com a écrit :

Do you have vertex centric indices on these super node vertices? That would allow you to pull out the edges you want quickly. Otherwise, your queries might attempt to read all edges which - at 1 million edges - is likely to time out because its too much data. Also, try to limit the size of the result set using [0..1000] after the outE. Does that work?

On Wed, Feb 27, 2013 at 5:22 AM, protheusfr notifications@github.comwrote:

Yes absolutely, every request like "g.V('name',' srx3600.interco.dc1.xxxxxx.com').map" work fine. But when I try to access to a connected vertex of this one (they have more than 1 million connected vertex to this "super-node") this fails systematically.

The same request "g.V('name','srx3600.interco.dc1. xxxxxx.com').outE('srx_deny').count()" with a vertex degree of 10 000 on srx3600.interco.dc1. xxxxxx.com work fine (quite long, but provide a response).

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-14173194 .

Matthias Broecheler http://www.matthiasb.com — Reply to this email directly or view it on GitHub.

Hmmm my mistake (even if it doesn't explain previous observed comportment) : replaced :

    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, "srx_close");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, "srx_deny");
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "srx_create");
            edge.setProperty("time", System.currentTimeMillis());
    }

    switch (Action) {
        case SRX_CREATE:
            edge = g.addEdge(null, Server, v, SRX_CREATE);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_CLOSE :
            edge = g.addEdge(null, Server, v, SRX_CLOSE);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        case SRX_DENY :
            edge = g.addEdge(null, Server, v, SRX_DENY);
            edge.setProperty("time", System.currentTimeMillis());
            break;
        default:
            edge = g.addEdge(null, Server, v, "connected");
            edge.setProperty("time", System.currentTimeMillis());
    }

so, index is now realy used.

BUT I've found another pb :

gremlin> g.V('name','srx3600.interco.dc1.xxxxxxx.com').outE('RT_FLOW_SESSION_CREATE')[0..1] Could not read from storage Display stack trace? [yN] y com.thinkaurelius.titan.core.TitanException: Could not read from storage ....

Caused by: org.apache.thrift.transport.TTransportException: Frame size (22818648) larger than max length (16384000)! at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

This is caused by "thrift_max_message_length_in_mb: 16" default option in cassandra.yaml.

The question is : why he try to get more than 16Mb off data to retrieve just two line of 30 bytes ?

@protheusfr Was there a resolution to this?

I'm running into this issue in Titan 0.4.1 with a single cassandra instance. Here is a simple example that displays this issue on an empty server:

> v1 = g.addVertex(["key": "google.com", "keytype": "domain"])
> v2 = g.addVertex(["key": ".com", "keytype": "tld"])
> for(i = 0; i < 800000; i++) {
    g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "domain", "outKeyType": "tld"]);
    g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "tld", "outKeyType": "domain"]);
    g.commit();  //Takes longer but ensures we don't run into any problems.  
}

> v1.key
==>google.com
> v2.bothE[0] //This works
> v2.outE[0] //This works
> v2.outE[500000] //This works but is incredibly slow
> v2.outE[550000] //This fails
> v2.inE[0] //This fails with no edge being returned.  
> v2.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production server that is having the same problem there is an index on timestamp. This really stinks because I can't access the incoming edges at all.

Yes, at over a million edges per vertex you are getting into "the danger zone" in particular without a vertex centric index. There isn't much we can do about it due to the limitations of Cassandra that cause this.

However, we will introduce partitioned vertices in Titan 0.5 to address the issue of extreme super nodes like the one you are encountering.

On Mon, May 19, 2014 at 11:37 AM, TimLudwinski notifications@github.comwrote:

I'm running into this issue in Titan 0.4.1 with a single cassandra instance. Here is a simple example that displays this issue on an empty server:

v1 = g.addVertex(["key": "google.com", "keytype": "domain"]) v2 = g.addVertex(["key": "google.com", "keytype": "domain"]) for(i = 0; i < 800000; i++) { g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "domain", "outKeyType", "tld"]); g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "tld", "outKeyType", "domain"]); g.commit(); //Takes longer but ensures we don't run into any problems. }

v1.key ==>google.com v1.bothE[0] //This works v1.inE[0] //This works v1.inE[500000] //This works but is incredibly slow v1.inE[550000] //This fails v1.outE[0] //This fails v1.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production server that is having the same problem there is an index on timestamp.

— Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-43540270 .

Matthias Broecheler http://www.matthiasb.com

Bonne nouvelle pour titan 0.5 Le 19 mai 2014 22:51, "Matthias Broecheler" notifications@github.com a écrit :

Yes, at over a million edges per vertex you are getting into "the danger zone" in particular without a vertex centric index. There isn't much we can do about it due to the limitations of Cassandra that cause this.

However, we will introduce partitioned vertices in Titan 0.5 to address the issue of extreme super nodes like the one you are encountering.

On Mon, May 19, 2014 at 11:37 AM, TimLudwinski notifications@github.comwrote:

I'm running into this issue in Titan 0.4.1 with a single cassandra instance. Here is a simple example that displays this issue on an empty server:

v1 = g.addVertex(["key": "google.com", "keytype": "domain"]) v2 = g.addVertex(["key": "google.com", "keytype": "domain"]) for(i = 0; i < 800000; i++) { g.addEdge(null, v1, v2, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "domain", "outKeyType", "tld"]); g.addEdge(null, v2, v1, 'ig_label', ["sourceDocument": "testdoc-123123123", "sourceDocumentType": "context", "timestamp": i, "scope": "public", "inKeyType": "tld", "outKeyType", "domain"]); g.commit(); //Takes longer but ensures we don't run into any problems. }

v1.key ==>google.com v1.bothE[0] //This works v1.inE[0] //This works v1.inE[500000] //This works but is incredibly slow v1.inE[550000] //This fails v1.outE[0] //This fails v1.outE.count() //This crashes the cassandra server

I didn't have a vertex centric index on this one, but on the production server that is having the same problem there is an index on timestamp.

Reply to this email directly or view it on GitHub< https://github.com/thinkaurelius/titan/issues/11#issuecomment-43540270> .

Matthias Broecheler http://www.matthiasb.com

Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/11#issuecomment-43556353 .

+1 for partitioned vertices

Good news, I will try it.

thinkaurelius / titan

Loading millions of edges at once on a single vertex causes timeouts in Cassandra #11