Open ChaohsinChan opened 7 years ago
Hey,
You can run the following code in order to retrieve the vertices. For example, let's count how many vertices you have on your graph.
import mizo.rdd.MizoBuilder;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class MizoVerticesCounter {
public static void main(String[] args) {
SparkConf conf = new SparkConf()
.setAppName("Mizo Vertices Counter")
.setMaster("local[1]")
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "1")
.set("spark.rpc.askTimeout", "1000000")
.set("spark.rpc.frameSize", "1000000")
.set("spark.network.timeout", "1000000")
.set("spark.rdd.compress", "true")
.set("spark.core.connection.ack.wait.timeout", "6000")
.set("spark.driver.maxResultSize", "100m")
.set("spark.task.maxFailures", "20")
.set("spark.shuffle.io.maxRetries", "20");
SparkContext sc = new SparkContext(conf);
long count = new MizoBuilder()
.titanConfigPath("titan-graph.properties")
.regionDirectoriesPath("hdfs://my-graph/*/e") // HDFS path to your HBase Table
.parseInEdges(v -> false)
.verticesRDD(sc)
.toJavaRDD()
.count(); // total number of vertices in your graph
System.out.println("Vertices count is: " + count);
}
}
Change 'hdfs://my-graph/*/e' to the HDFS path of your HBase Table.
Let me know if you have any further questions.
Thank you for your reply. I have two suggestions. First, whether we can get the HDFS path through the HBase interface, which is more convenient to use, usually, we only know that HBase table name and it's configurations. Second, whether the project can be converted to Maven management, which can also be developed inside the Eclipse. For those who are not familiar with Idea, it would take a long time to build up the development environment.
Thanks for your suggestions -
Regarding the Table name, I generally prefer not to rely on Hadoop config files, but rather specify paths directly.
Regarding Maven - good advice, I will switch to Maven and reupload soon.
Did you manage to run the code eventually?
I am not very familiar with Idea, so until now has not set up a good development environment. Can you give me some advice?
You only have to open the root directory in IntelliJ, then go to MizoEdgesCounter, tight click and debug.
When I import a project to Idea, choosing to create a project from an existing source will prompt me that the project file already exists and that other errors will occur when I choose to overwrite it. I do not know why.But if I choose Import a project from an existing model,only Eclipse,Gradle,Maven can choose.So I still did not succeed.
Use Open rather than Import Project, should work
On Tue, 13 Dec 2016 at 10:11 ChaohsinChan notifications@github.com wrote:
When I import a project to Idea, choosing to create a project from an existing source will prompt me that the project file already exists and that other errors will occur when I choose to overwrite it. I do not know why.But if I choose Import a project from an existing model,only Eclipse,Gradle,Maven can choose.So I still did not succeed.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266673431, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy_2-55frNsHvxQZ1LOLW2AgMADmUwks5rHlM4gaJpZM4LKZ6v .
Try using File > Open and choose the project iml file
Thank you for your suggestion, I am left with a last problem. Module mizo-core: invalid item 'com.google.guava:guava:19.0' in the dependencies list Module mizo-core: invalid item 'com.thinkaurelius.titan:titan-core:1.0.0' in the dependencies list How do I introduce these dependencies? And Hbase and Spark without these dependency problems.
These dependencies should come from Maven. I see that the POMs are not included in the repo, I will add them in 12 hours.
OK,Thanks. I find the files titan-graph.properties and log4j.properties are also missing,you can add them together.
You can omit the log4j properties file, and graph.properties is your Titan properties file. On Tue, 13 Dec 2016 at 11:39 ChaohsinChan notifications@github.com wrote:
OK,Thanks. I find the files titan-graph.properties and log4j.properties are also missing,you can add them together.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266691483, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy_26VYYfQiomQD89rSVMIXt9rib9gks5rHmfCgaJpZM4LKZ6v .
I find an error: Exception in thread "main" java.lang.IllegalArgumentException: Could not find implementation class: com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.
I suspect that this problem is about the config titan-graph.properties,can you show your config to me?
Send me your properties file On Tue, 13 Dec 2016 at 12:42 ChaohsinChan notifications@github.com wrote:
I find an error: Exception in thread "main" java.lang.IllegalArgumentException: Could not find implementation class: com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.
I suspect that this problem is about the config titan-graph.properties,can you show your config to me?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266705734, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy_7W_e2N-RVHHSSjJsE4uJ5r8wE6Wks5rHnatgaJpZM4LKZ6v .
storage.backend=hbase storage.hostname=hlg-3p163-wangyongzhi,hlg-3p190-wangyongzhi,hlg-3p166-wangyongzhi storage.hbase.table=titandb storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure cache.db-cache = true cache.db-cache-clean-wait = 20 cache.db-cache-time = 180000 cache.db-cache-size = 0.5 index.search.backend=elasticsearch index.search.hostname=127.0.0.1 index.search.elasticsearch.client-only=true
I wonder this configuration is not right. I just copy them from Titan configuration .
Add: storage.hbase.compat-class = com.thinkaurelius.titan.diskstorage.hbase.HBaseCompat1_0
It does not work,should I need other dependencies?
Let me build it myself and I will upload it as a complete Maven project. Will update you soon.
OK,thanks
All the problems are solved by me, and now to the last step, but there was a mistake:
Exception in thread "main" java.lang.ClassCastException: com.thinkaurelius.titan.graphdb.types.VertexLabelVertex cannot be cast to com.thinkaurelius.titan.graphdb.internal.InternalRelationType at mizo.rdd.MizoRDD.lambda$loadRelationTypes$3(MizoRDD.java:146) at java.lang.Iterable.forEach(Iterable.java:75)
Would you give me some advice?
Please send me your code On Wed, 14 Dec 2016 at 9:36 ChaohsinChan notifications@github.com wrote:
All the problems are solved by me, and now to the last step, but there was a mistake:
Exception in thread "main" java.lang.ClassCastException: com.thinkaurelius.titan.graphdb.types.VertexLabelVertex cannot be cast to com.thinkaurelius.titan.graphdb.internal.InternalRelationType at mizo.rdd.MizoRDD.lambda$loadRelationTypes$3(MizoRDD.java:146) at java.lang.Iterable.forEach(Iterable.java:75)
Would you give me some advice?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266964060, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy_5DtwKfzfWdJCHt2fq0DF46VkT-Zks5rH5xsgaJpZM4LKZ6v .
public class MizoEdgesCounter { public static void main(String[] args) { System.setProperty("hadoop.home.dir", "C:\F盘\hadoop-2.6.0.tar\hadoop-2.6.0\hadoop-2.6.0"); SparkConf conf = new SparkConf() .setAppName("Mizo Edges Counter") .setMaster("local[1]") .set("spark.executor.memory", "4g") .set("spark.executor.cores", "1") .set("spark.rpc.askTimeout", "1000000") .set("spark.rpc.frameSize", "1000000") .set("spark.network.timeout", "1000000") .set("spark.rdd.compress", "true") .set("spark.core.connection.ack.wait.timeout", "6000") .set("spark.driver.maxResultSize", "100m") .set("spark.task.maxFailures", "20") .set("spark.shuffle.io.maxRetries", "20");
SparkContext sc = new SparkContext(conf);
long count = new MizoBuilder()
.logConfigPath("C:\\ideapluin\\mizo-master\\mizo-master\\target\\test\\mizo-rdd\\log4j.properties")
.titanConfigPath("C:\\ideapluin\\mizo-master\\mizo-master\\target\\test\\mizo-rdd\\titan-graph.properties")
.regionDirectoriesPath("hdfs://hlg-3p163-wangyongzhi:8020/apps/hbase/data/data/default/titandb6/8f68e1d6f9d35a4683e1a4c264cd669f/e")
.parseInEdges(v -> false)
.edgesRDD(sc)
.toJavaRDD()
.count();
System.out.println("Edges count is: " + count);
}
}
I did not modify your code. This error occured here: ` protected static HashMap<Long, InternalRelationType> loadRelationTypes(String titanConfigPath) { TitanGraph g = TitanFactory.open(titanConfigPath); StandardTitanTx tx = (StandardTitanTx)g.newTransaction();
HashMap<Long, InternalRelationType> relations = Maps.newHashMap();
tx.query()
.has(BaseKey.SchemaCategory, Contain.IN, Lists.newArrayList(TitanSchemaCategory.values()))
.vertices()
.forEach(v -> relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v)));
g.close();
return relations;
}`
On MizoRDD loadRelationTypes, change the forEach to:
.foraEach(v -> { if (v instanceof InternalRelationType) { relation.put(...) } }); On Wed, 14 Dec 2016 at 9:40 ChaohsinChan notifications@github.com wrote:
I did not modify your code. ` protected static HashMap<Long, InternalRelationType> loadRelationTypes(String titanConfigPath) { TitanGraph g = TitanFactory.open(titanConfigPath); StandardTitanTx tx = (StandardTitanTx)g.newTransaction();
HashMap<Long, InternalRelationType> relations = Maps.newHashMap(); tx.query() .has(BaseKey.SchemaCategory, Contain.IN, Lists.newArrayList(TitanSchemaCategory.values())) .vertices() .forEach(v -> relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v))); g.close(); return relations;
}`
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266964700, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy_5YQ2XB81f7tLf7Yjjqg0oq9vu2hks5rH51mgaJpZM4LKZ6v .
Modify the code as i mentioned, should solve this problem On Wed, 14 Dec 2016 at 9:45 ChaohsinChan notifications@github.com wrote:
I did not modify your code. This error occured here:
` protected static HashMap<Long, InternalRelationType> loadRelationTypes(String titanConfigPath) { TitanGraph g = TitanFactory.open(titanConfigPath); StandardTitanTx tx = (StandardTitanTx)g.newTransaction();
HashMap<Long, InternalRelationType> relations = Maps.newHashMap(); tx.query() .has(BaseKey.SchemaCategory, Contain.IN, Lists.newArrayList(TitanSchemaCategory.values())) .vertices() .forEach(v -> relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v))); g.close(); return relations;
}`
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266965499, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy_9kFLq2CXhOWk2kJ2C1Zm-vYvSqVks5rH56WgaJpZM4LKZ6v .
The problem above was solved, but there was aslo a mistake:
java.lang.IllegalArgumentException: Invalid ASCII encoding offset: 625 at com.thinkaurelius.titan.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:105) at mizo.hbase.MizoTitanHBaseRelationParser.readPropertyValue(MizoTitanHBaseRelationParser.java:179) at mizo.iterators.MizoBaseRelationsIterator.handleProperty(MizoBaseRelationsIterator.java:87) at mizo.iterators.MizoBaseRelationsIterator.getEdgeOrNull(MizoBaseRelationsIterator.java:46)
Ok I will check it later today.
Shortly - Mizo was never tested on a graph with vertex labels, so thats probably the issue.
Can you describe your Titan schema? Which edges do you have, their types etc? On Wed, 14 Dec 2016 at 10:05 ChaohsinChan notifications@github.com wrote:
The problem above was solved, but there was aslo a mistake: java.lang.IllegalArgumentException: Invalid ASCII encoding offset: 625 at com.thinkaurelius.titan.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:105) at mizo.hbase.MizoTitanHBaseRelationParser.readPropertyValue(MizoTitanHBaseRelationParser.java:179) at mizo.iterators.MizoBaseRelationsIterator.handleProperty(MizoBaseRelationsIterator.java:87) at mizo.iterators.MizoBaseRelationsIterator.getEdgeOrNull(MizoBaseRelationsIterator.java:46)
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266969042, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy_1wqv8jl7s-enhYRyVplAdd2be4vks5rH6NmgaJpZM4LKZ6v .
I use the Titan example,Graph Of The Gods,you can see here http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html
Ok, I will check it soon. On Wed, 14 Dec 2016 at 10:12 ChaohsinChan notifications@github.com wrote:
I use the Titan example,Graph Of The Gods,you can see here http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html http://url
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/imri/mizo/issues/1#issuecomment-266970308, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPy__uk9q-Vl1In_Yciqb7KeCX00rFJks5rH6UHgaJpZM4LKZ6v .
Fixed the bug - checked using the Graph of the Gods, works :) Also updated the project to use Maven
Let me know if it works for you.
There was aslo a mistake,how can I resolve it? Should be guava version of the conflict
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedTime(Ljava/util/concurrent/TimeUnit;)J at com.google.common.cache.LocalCache$LoadingValueReference.elapsedNanos(LocalCache.java:3600) at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2412) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2373) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2335) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2250) at com.google.common.cache.LocalCache.get(LocalCache.java:3985) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4788) at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$6$6.call(StandardTitanTx.java:1244) at com.thinkaurelius.titan.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:268) at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$6.execute(StandardTitanTx.java:1258) at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx$6.execute(StandardTitanTx.java:1126) at com.thinkaurelius.titan.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:198) at com.thinkaurelius.titan.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:54) at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:40) at com.thinkaurelius.titan.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:30) at com.thinkaurelius.titan.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:57) at com.google.common.collect.Iterables$7.iterator(Iterables.java:613) at java.lang.Iterable.forEach(Iterable.java:74) at mizo.rdd.MizoRDD.loadRelationTypes(MizoRDD.java:149) at mizo.rdd.MizoRDD.<init>(MizoRDD.java:71) at mizo.rdd.MizoBuilder$1.<init>(MizoBuilder.java:53) at mizo.rdd.MizoBuilder.edgesRDD(MizoBuilder.java:53) at MizoEdgesCounter.main(MizoEdgesCounter.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
This error is caused because there is a mismatch between Titan and other components version of Guava.
I succeed to run the code for HBase 1.0.3 -- try to checkout the code into a new directory and run it from there, without any modifications. Should work
When I run it without any modifications,there was an error here:
Exception in thread "main" java.lang.IllegalArgumentException: Could not find implementation class: com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:47)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:473)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:407)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.
Pushed an update for fixing this, try now - working for me
I get result,but there was an error here when the job completed:
27490 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 0 finished: count at MizoEdgesCounter.java:34, took 2.037018 s Edges count is: 34
27871 [DestroyJavaVM] WARN com.thinkaurelius.titan.graphdb.database.StandardTitanGraph - Unable to remove graph instance uniqueid c0a8adc387204-DE0018-PC1 com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:144) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.set(KCVSConfiguration.java:141) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.set(KCVSConfiguration.java:118) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration.remove(KCVSConfiguration.java:159) at com.thinkaurelius.titan.diskstorage.configuration.ModifiableConfiguration.remove(ModifiableConfiguration.java:42) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.closeInternal(StandardTitanGraph.java:191) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.access$600(StandardTitanGraph.java:78) at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph$ShutdownThread.start(StandardTitanGraph.java:803) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:102) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(Shutdown.java:123) at java.lang.Shutdown.sequence(Shutdown.java:167) at java.lang.Shutdown.shutdown(Shutdown.java:234) Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Permanent exception while executing backend operation setConfiguration at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42) ... 13 more Caused by: java.lang.IllegalArgumentException: Connection is null or closed. at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:310) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:712) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:694) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:532) at com.thinkaurelius.titan.diskstorage.hbase.HConnection1_0.getTable(HConnection1_0.java:22) at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:424) at com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.mutateMany(HBaseKeyColumnValueStore.java:189) at com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.mutate(HBaseKeyColumnValueStore.java:88) at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.mutate(ExpectedValueCheckingStore.java:65) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration$2.call(KCVSConfiguration.java:146) at com.thinkaurelius.titan.diskstorage.configuration.backend.KCVSConfiguration$2.call(KCVSConfiguration.java:141) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:133) at com.thinkaurelius.titan.diskstorage.util.BackendOperation$1.call(BackendOperation.java:147) at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56) ... 14 more
I will fix it soon. Did you succeed?
Yes! In addition to the above error, I have to get the results, it is not easy!
I will traverse all the vertex information soon, check the vertex information is correct or not.
Ok keep me updated :)
How can I bulk import data to Titan, can you give me some advice? I have 100GB of data. Thanks.
Hey, Create a new transaction that uses batches (TitanGraph.buildTransaction().enableBatchLoading().checkExternalVertexExistence(false)), then commit() the transaction every X insertions, for example 50k.
Hello imri,
Thank you for the great work on mizo.
I meet same problems described in the questions in stackoverflow:
Q1: http://stackoverflow.com/questions/41121262/reading-a-large-graph-from-titan-on-hbase-into-spark?rq=1
Q2:http://stackoverflow.com/questions/35464538/how-to-process-large-titan-graph-using-spark
Until now, i can't find good practice by Titan with Spark for OLAP.
Do you have tried to directly use SparkGraphComputer to do OLAP? do you have any example codes?
In the TitanBlueprintsGraph.java file,when override the computer method:
@Override public <C extends GraphComputer> C compute(Class<C> graphComputerClass) throws IllegalArgumentException { if (!graphComputerClass.equals(FulgoraGraphComputer.class)) { throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass); } else { return (C)compute(); } }
So i think,when i create TitanGraph,it don't support SparkGraphComputer, I can only create hadoopgraph by graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties'), how can it do the tranversal of Titan graph DB? I can't find how it scan the HBase tables.
Can you have any example code for SparkGraphComputer work with titan?
Thank you very much.
Hey,
This answer might be helpful.
I have used SparkGraphComputer using Titan, but I malfunctions, and is really buggy. In order for this to work, you have to use HadoopGraph (as specified in the answer above), which internally uses an InputFormat to read the graph. Titan's implementation of InputFormat was buggy - first of all, it skips vertices (if you count the number of vertices using the InputFormat, you get a wrong answer). Second, it crashes in some circumstances (for example, an edge that connected vertex to itself). Third, SparkGraphComputer is really really slow - I haven't researched why. To sum up - as far as I'm concerned - SparkGraphComputer is bad.
What are you trying to achieve? Tell me more, maybe we can figure it out using Mizo.
Best regards
Thank you very much! So excited that you answered me. (Please ignore my english grammatical errors).
Now i am trying to use Titan to store some relation data about users,user follow relation, user's goods for sell (Second hand). And then i want to do some OLAP analyze to do some relation recommend,goods recommend, user cluster divide and so on.
For example:
Case1: A follow B, B follow C, and maybe A will be interesting with C.
Case2: I want to find why and how users follow another one, if there are any common features.
Now,I have already build my Titan Cluster using HBase + ElasticSearch as backend for OLTP service, and i am trying to build my OLAP environment based on Titan and Spark,but found there is no good document. And even Titan don't support Spark well.
When i found the mizo project, i think maybe i can do OLAP on Spark GraphX. I mean, i only scan my Titan Hbase table for all vertices and edges into Spark, and use Spark GraphX to do the analyze. Is this possible?
Thank you again !
So if I get you right, you are willing to expand from a given vertex through multiple hops. Mizo only allows you to expand from a given vertex to its direct edges.
I haven't used GraphX, but as far as I'm concerned, it should be really easy to integrate Mizo with it, since it only expects an RDD of edges, so you can convert Mizo EdgesRDD to a RDD of GraphX edges. I'm not sure what you'll be able to achieve using GraphX, but give it a try.
If you need any help, let me know.
Thank you,i will have a try.
Hello imri,
I have started a spark OLAP task based on Titan &Hbase & Gremlin Spark Computer, But as your experiments, it works very slow, when i have 150 Vertex in the graph,it costs 4 minutes,and when there are 10millon vertex, it cost too long time.
Here it seems stop in readRDD from Titan.
My Hbase version is 0.94,but i found in mizo, it depends 1.0.2 hbase client. and my Hbase in production envrionment don't allow me directly read HFiles...
I am trying to solve these problems.
PS:I have a questions about using Titan, is is there any way to create the property key first, commit and then later do indexing? Because when i write properties without create index(using eslatic search),it have errors.
Hello, I have successfully run the edges and vertices count test case user Mizo! Thank you. I am using hbase 0.98,spark1.5.1 and the Titan's God graph. I still have some questions,the vertices count is not right, there are 17 edges,but the mizo count result eges coutn is 32. it is not 17*2. Then i build a very simple graph, only 3 vertices, And after my test by mizo, it found the vertices count is 10, there are 7 non-related vertices, i think these edges may be index or some internal use vertices in Titan. i think this maybe related with 'Multiple Item Data Model(ref:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.TitanDB.BestPractices.html )',because when i scan my the table by Hbase shell, the same rowkey with more values.
protected static HashMap<Long, MizoTitanRelationType> loadRelationTypes(String titanConfigPath)
{
...
.forEach(v -> {
if (v instanceof InternalRelationType)
relations.put(v.longId(), new MizoTitanRelationType((InternalRelationType)v));
});
}
private Comparator<Cell> ASC_CELL_COMPARATOR = (left, right) -> {
int c = CellComparator.compareStatic(left, right);
if (c != 0) {
return c;
} else {
if (left.getFamilyLength() + left.getQualifierLength() == 0 &&
left.getTypeByte() == KeyValue.Type.Minimum.getCode()) {
return 1;
} else if (right.getFamilyLength() + right.getQualifierLength() == 0 &&
right.getTypeByte() == KeyValue.Type.Minimum.getCode()) {
return -1;
} else {
boolean sameFamilySize = left.getFamilyLength() == right.getFamilyLength();
if (!sameFamilySize) {
return Bytes.compareTo(left.getFamilyArray(), left.getFamilyOffset(), left.getFamilyLength(),
right.getFamilyArray(), right.getFamilyOffset(), right.getFamilyLength());
} else {
int diff = CellComparator.compareStatic(left, right);
if (diff != 0) {
return diff;
} else {
c = Longs.compare(right.getTimestamp(), left.getTimestamp());
if (c != 0) diff=c;
//diff = CellComparator.compareTimestamps(right, left); // Different from CellComparator.compare()
return diff != 0 ? diff : (255 & right.getTypeByte()) - (255 & left.getTypeByte());
}
}
}
}
};
i am not quite under this part, why need Creates an ascending-sorted cells iterator, what does Cell mean, it is a properties or Edge in the one row? Any suggested document for me to understand Htable,regionfamily,cell etc. Any suggested document for me to understand Titan datamodule?
I am running Titan 1.0 with HBase 1.0.3 backend.I want to get the Titan vertices from HBase directly using Apache Spark 1.6.1 ,can you give me some advice? Thanks