twitter / elephant-bird

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Apache License 2.0
1.14k stars 387 forks source link

lucene-queryparser compatibility? #467

Open immo-huneke-zuhlke opened 8 years ago

immo-huneke-zuhlke commented 8 years ago

In com.twitter.elephantbird.mapreduce.output.LuceneIndexOutputFormat.createIndexWriter, there is a call to the API org.apache.lucene.index.LogByteSizeMergePolicy.setUseCompoundFile, which was removed after version 4.0.0 of org.apache.lucene:lucene-queryparser. This prevents me from using any features of later versions of the query parser in my program (specifically, 4.7.2) if I want to continue to use elephant-bird-pig-lucene.

What are your plans for upgrading this dependency? Currently, the org.apache.lucene components are at version 6.0.1 (35 releases beyond 4.0.0, released in October 2012).

dvryaboy commented 8 years ago

Hi Immo, Development on this sub package is mostly suspended (nothing to do with the code -- the internal Twitter need for which this was built is now served by a completely different system). We can look at pull requests.

immo-huneke-zuhlke commented 8 years ago

Thank you - I'll consider creating a fork if time permits.

isnotinvain commented 8 years ago

Yeah sorry, this is pretty old and we haven't made changes to it in a long time. We may want to consider removing it if it's gotten so stale as to be obsolete. IIRC there are other hadoop-lucene integrations out there, probably with better support. I remember this one: http://www.cloudera.com/documentation/archive/search/1-3-0/Cloudera-Search-User-Guide/csug_introducing.html but I don't know the current state of that either.

immo-huneke-zuhlke commented 8 years ago

Many thanks - I suggest closing this issue and leaving things exactly as they are. It is never a great idea to just withdraw a library that other people have built into their applications. I had enough trouble with the gephi library, whose original repository was closed down and the replacement only contained later versions that were not backwards compatible.

isnotinvain commented 8 years ago

We won't ever un-publish the existing maven artifacts. But if we really do feel that this abandoned then I'd be in favor of removing it from future versions of elephant bird -- that way nobody will waste their time trying to use it. On the other hand, if people are using it and want to improve on it, PRs are always welcome.