tjake / Solandra

Solandra = Solr + Cassandra
Apache License 2.0
882 stars 150 forks source link

Solandra Future #168

Open jasonmk opened 12 years ago

jasonmk commented 12 years ago

Jake,

Given DataStax' inclusion of Enterprise Search in DSE 2, what is the future of this separate Solandra project? Do you still intend to maintain this? Is this basically just a separate distribution of the base code that will go into DSE? We are seriously looking at DSE, but I think you might be solving a slightly different problem then what we are currently looking at Solandra for, but before we commit to using Solandra, it would make me feel much better if I knew it wasn't going away.

Thanks, Jason

benmccann commented 12 years ago

I'm a bit curious about this as well. From what I gather DSE uses native solr indexes. I'd be interested to hear what the pros and cons of the two approaches are. I'm a bit worried about Solandra's future given that Datastax has said "At this time, we now consider the current versions of Brisk and Solandra to be the final releases from us in open source form."

jasonmk commented 12 years ago

Out of curiosity, where did you see that statement? I've looked for anything official from Datastax and haven't found it.

benmccann commented 12 years ago

That statement was on the following page: http://www.datastax.com/2011/09/committing-hive-driver-into-apache-cassandra

tjake commented 12 years ago

I've just published a blog post about this: http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details

I plan to keep Solandra working with Cassandra but I don't have resources to work on it heavily.

benmccann commented 12 years ago

Thanks for the update!

jasonmk commented 12 years ago

Killer blog post. Thanks for that!

tjake commented 12 years ago

@jkusar does DSE fit your use case?

jasonmk commented 12 years ago

I need to investigate further. It's a departure from where we are, and the need for separate search and cassandra nodes is a bit of a turn off. We don't have a big data problem; we have a replication/availability problem. More nodes just means more expense and more things to go wrong.

That said, I'm definitely looking at it, but we have a pretty strong bias towards open-source software and try to contribute back where possible so Solandra is of high interest to me.

tjake commented 12 years ago

Cool. Why do you think there are many nodes? You can have all nodes be solr

jasonmk commented 12 years ago

Did I miss something? I thought you had to have a set of Solr nodes and a set of Cassandra nodes. One set for your index and one for the data storage. With RF3, that meant we would need 6 nodes per datacenter. That times 4 data centers adds up fast.

If we only have Solr nodes, does that still let us retrieve the original data via CQL queries?

benmccann commented 12 years ago

I'm not Jason, but I'll throw out where I am in my search. I think I'm narrowing down to DSE or elasticsearch. The big plus for DSE is its Cassandra integration. However, right now I'm probably leaning towards elasticsearch because it seems easier to deal with nested JSON docs using it. Still want to do some more prototyping and investigation before I make any decisions though. I'm less comfortable with elasticsearch's durability and backup story right now just because I know less about it, so I have some more reading to do.

tjake commented 12 years ago

No as the diagram shows in the blog post solr and cassandra are running in the same jvm. With DSE you can run Cassandra, Cassandra+Hadoop, or Cassandra+Solr, its upto you on how you mix and match. Yes you can execute solr queries from CQL with all solr nodes.

jasonmk commented 12 years ago

Ok, that's very interesting. Looks like I'm going to be grabbing a copy and prototyping it properly. There are a few things that we need to use normal cassandra secondary indexes against to guarantee consistency. That was why I was interested in still having cassandra access, but now that you point it out, it seems obvious. Of course a copy of the data would still need to reside in cassandra or else you would never be able to rebuild the index. I'll pull down a copy and see if it wouldn't meet our needs. I'd be lying if I didn't admit that the Admin console was a very sexy temptation and I'd love to have Solr 4.0.