Open BillBaird opened 10 years ago
Since Titan is already using ZooKeeper, wouldn't it be simpler to separate the ElasticSearch code, and instead provide an output interface to Kafka?
See the ElasticSearch Kafka-River Plugin: https://github.com/endgameinc/elasticsearch-river-kafka
Adding a Kafka out would make it easy for people to configure ElasticSearch as needed, and it would allow Titan to feed into multiple backends such as Solr, ElasticSearch, or any other backend system, without having to write custom connectors for each.
Kafka is fast, durable, ordered, and 0.8 is replicated. It's commonly used to feed Storm and Spark so integrating Kafka with Titan would provide a generic way to provide pre-processing and post-processing from/to other systems.
Example Kafka dataflow... Source: http://blog.infochimps.com/2012/10/30/next-gen-real-time-streaming-storm-kafka-integration/
List of Kafka clients... https://cwiki.apache.org/confluence/display/KAFKA/Clients
Overview of Kafka's binary protocol... https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
As discussed in https://groups.google.com/forum/#!topic/aureliusgraphs/VGv-RJwt8zI we have added the ability to pass arbitrary Parameters through to the indexing backend.
Colleagues of mine tackled this solely for ElasticSearch by modifying ElasticSearchIndex.java to support an additional parameter on index creation. This additional parameter gets passed directly to ES. We are able to use it like so:
g.makeKey("name")
.dataType("String.class")
.indexed("search", Vertex.class, com.thinkaurelius.titan.core.Parameter.of("tokenizer", "(\\.|\\s|\\@|\\/)+"))
.make()
This makes ES tokenize on period, space, @ and / instead of its normal tokenizer.
@mbroecheler @dalaro is this something you would consider for a pull request? We sort of hacked it together but can put in a little effort to improve it with your guidance.
This is a good use of the new Parameter arguments for index registration. If you can put this into a generally usable pull request with test coverage, we would greatly appreciate it.
On Fri, Feb 14, 2014 at 8:20 AM, kevinschumacher notifications@github.comwrote:
Colleagues of mine tackled this solely for ElasticSearch by modifying ElasticSearchIndex.java to support an additional parameter on index creation. This additional parameter gets passed directly to ES. We are able to use it like so:
g.makeKey("name") .dataType("String.class") .indexed("search", Vertex.class, com.thinkaurelius.titan.core.Parameter.of("tokenizer", "(.|\s|\@|\/)+")) .make()
@mbroecheler https://github.com/mbroecheler @dalarohttps://github.com/dalarois this something you would consider for a pull request? We sort of hacked it together but can put in a little effort to improve it with your guidance.
Reply to this email directly or view it on GitHubhttps://github.com/thinkaurelius/titan/issues/399#issuecomment-35098346 .
Matthias Broecheler http://www.matthiasb.com
g.makeKey("my_cjk_search_field").dataType(String.class).indexed("search", Vertex.class, com.thinkaurelius.titan.core.Parameter.of("analyzer", "cjk")).make();
==> this does not work for me.. I want to set different (custom) analyzer for each property.. Any update? this is really important for me and my customer :D
Unfortunately I never got around to cleaning up the code and submitting the pull request. As far as I know that functionality doesn't exist in Titan yet (at least not in the 0.4.x series)
Sent from my mobile device
On Jul 22, 2014, at 10:42 PM, bezalel notifications@github.com wrote:
g.makeKey("my_cjk_search_field").dataType(String.class).indexed("search", Vertex.class, com.thinkaurelius.titan.core.Parameter.of("analyzer", "cjk")).make();
==> this does not work for me.. I want to set different (custom) analyzer for each property.. Any update? this is really important for me and my customer :D
— Reply to this email directly or view it on GitHub.
@kevinschumacher Could you please share titan jar file for Titan 0.4.4? :D my email is bezalel.dev@gmail.com
Titan exposes a subset of ElasticSearch features. ElasticSearch allows customized tokenizers and filters. Proper use of es to index Titan propertykeys would allow Titan to take advantage of these powerful behaviors. Examples would be a native es tokenizer like the pathhierarchy-tokenizer http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer/ or a plugable tokenizer like a phonetic tokenizer to enable phonetic searches http://blog.jessitron.com/2012/04/configuring-soundex-in-elasticsearch.html https://github.com/elasticsearch/elasticsearch-analysis-phonetic
The phonetic search allows for requests to return results that can correct mispellings, and enable "did-you-mean" types of searches. Soon, es will have completion suggesters.
http://www.elasticsearch.org/blog/you-complete-me/
With current es integration, a Titan graph is unable to natively take advantage of these powerful capabilities.
As a suggested approach, Blueprints allows for passing additional parameters to createKeyIndex: https://github.com/tinkerpop/blueprints/blob/master/blueprints-core/src/main/java/com/tinkerpop/blueprints/KeyIndexableGraph.java
A similar approach would be to extend TypeMaker's .indexed to be .indexed(String indexName,Class type,Parameter... parameters) where the es tokenizers and filters could be configured.
Plugable tokenizers would have to be installed first. It would be nice if there were a way of accomplishing this through es configuration, perhaps through a storage.index.search.plugin property. This would be best accomplished in conjunction with issue #343