lbehnke / h2database

Automatically exported from code.google.com/p/h2database
0 stars 0 forks source link

FullTextLucene: Use Lucene 3.x #147

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
FullTextLucene creates a new IndexReader instance every time an FTL search is 
performed.

Reusing the IndexReader is a nontrivial feature, and very easy to get wrong. In 
Lucene 3.0.0, 
however, the IndexWriter now offers an IndexReader that stays in sync with the 
index as it is edited.

This new IndexReader can share the same lifecycle of it's source IndexWriter 
and thus be reused as 
desired for concurrent index writes and searches.

Original issue reported on code.google.com by mbis...@gmail.com on 8 Dec 2009 at 2:57

GoogleCodeExporter commented 9 years ago
Attached is a patch with the following changes:

1. Upgrades H2 to use Lucene 3.0.0.
2. Caches IndexSearchers for each path so that it can be reused by index 
searches.

Original comment by mbis...@gmail.com on 8 Dec 2009 at 3:12

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry for the delay.
Does your patch require Lucene 3.0? If yes,
I would like to wait a bit longer (Lucene 3 is relatively new).
Not much, but maybe a week or so.

Original comment by thomas.t...@gmail.com on 15 Dec 2009 at 4:52

GoogleCodeExporter commented 9 years ago
I think this should wait a bit longer than that.  The new single Reader 
interface in
3.0 that this patch uses is pretty new, and I am not completely sure I am using 
it
correctly. I have been getting results that I don't expect sometimes in my
application when compared to Lucene 2.2.

Original comment by mbis...@gmail.com on 16 Dec 2009 at 7:38

GoogleCodeExporter commented 9 years ago
This patch includes the correct usage of IndexWriter.getReader().  My testing 
process was as follows:

1. Run an app that inserts rows continually; these rows contain indexable 
fields.
2. While that app is running, use the h2 console to run 'SELECT * FROM 
FTL_SEARCH(...)' with a query that many 
of the rows would have a hit generated.
3. Keep running the query in the h2 console while adds progress. The result 
list grew as the number of rows 
grew.

Original comment by mbis...@gmail.com on 20 Dec 2009 at 4:35

Attachments:

GoogleCodeExporter commented 9 years ago
> My testing process was as follows

If it's not too much to ask, could you create a test case? Just a simple 
standalone
Java class, the assertions don't have to be very complicated... Something like 
http://h2database.googlecode.com/svn/trunk/h2/src/test/org/h2/test/db/TestFullTe
xt.java

Original comment by thomas.t...@gmail.com on 20 Dec 2009 at 12:35

GoogleCodeExporter commented 9 years ago
I am working on a Lucene test; in the process I will remove the lucene flags in
TestFullText so that it doesn't use Lucene but rather tests the H2 fulltext 
feature.

Original comment by mbis...@gmail.com on 23 Dec 2009 at 7:41

GoogleCodeExporter commented 9 years ago
I have a question about transactions and indexing. When does the trigger fire? 
Should
the FullText class be aware that it is in a transaction? If so, does it need to 
roll
back changes if the transaction fails?

Original comment by mbis...@gmail.com on 23 Dec 2009 at 7:42

GoogleCodeExporter commented 9 years ago
Hi,

Sorry for the delay. 

> I will remove the lucene flags in TestFullText

That's OK, however a new test case may be simpler.

> When does the trigger fire?

When the row is inserted/updated/deleted (before the commit). A transaction 
rollback
will also rollback the operations that were done within the trigger. This is not
documented yet, I will do that. 

> Should the FullText class be aware that it is in a transaction?
> does it need to roll back changes if the transaction fails

Usually no and no. Currently triggers don't provide a way to detect rollback.

Actually this is a problem for the Lucene fulltext search... I will try to find 
a
solution. It is not a problem for the 'native' fulltext search however.

Original comment by thomas.t...@gmail.com on 30 Dec 2009 at 11:46

GoogleCodeExporter commented 9 years ago
The rollback problem with the Lucene fulltext search will be fixed in the next
release. The trigger will also be called on rollback (this is a new feature in 
the
CREATE TRIGGER syntax). 

Original comment by thomas.t...@gmail.com on 30 Dec 2009 at 1:11

GoogleCodeExporter commented 9 years ago
I have attached a new patch that contains an updated FullTextTest. The big 
changes in
the test are to add a search while inserting in the multithreaded test.

Interestingly, the Lucene FT passes this updated test, while the native FT does 
not.
One of the threads gets blocked in the query and never releases, while the other
thread completes the test.  The test hangs indefinitely.

Original comment by mbis...@gmail.com on 7 Jan 2010 at 4:25

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks! I will have a look at the patch next week.

Original comment by thomas.t...@gmail.com on 8 Jan 2010 at 9:23

GoogleCodeExporter commented 9 years ago
I'm sorry for the delay. I could not apply the patch, could you create a new 
one that
is based on the trunk?

It would be nice to have a solution that works for both the 'old' and the 'new'
Lucene. I'm not sure if that's possible. An alternative is to wait a bit longer,
until Lucene 2.x is deprecated.

Original comment by thomas.t...@gmail.com on 21 Mar 2010 at 11:40

GoogleCodeExporter commented 9 years ago
I'm changing the summary from 'FullTextLucene should reuse IndexReader as much 
as
possible' to 'FullTextLucene: Use Lucene 3.x'. Please tell me if that's not 
accurate.

Original comment by thomas.t...@gmail.com on 21 Mar 2010 at 11:42

GoogleCodeExporter commented 9 years ago
Attached is a new patch aligned with tonight's trunk. I will delete the old one.

Personally I would like to see H2 1.2 released as soon as it's ready, so 
introducing a change like this will pull 
time away from other issues that need to be addressed before release.  If 
someone really wants these changes 
they can patch it themselves and give it a go.

The only thing I would like to bring up, though, is that the TestFullText.java 
changes I made have revealed a bug 
in the native FullText. Please see Comment #10 above.

Original comment by mbis...@gmail.com on 22 Mar 2010 at 5:46

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks! The patch looks very good. Currently I'm not sure if it's time yet
to switch to Lucene 3. What do you think?

Original comment by thomas.t...@gmail.com on 26 Mar 2010 at 4:06

GoogleCodeExporter commented 9 years ago
I agree that 3.0 has some issues, probably more than comfortable at the moment. 
Here's the list of issues slated for 3.1:

http://issues.apache.org/jira/browse/LUCENE/fixforversion/12314025

Several leaks are in there, which is always a bad thing for a long-running 
system.

Original comment by mbis...@gmail.com on 26 Mar 2010 at 11:14

GoogleCodeExporter commented 9 years ago
Updated patch to use latest lucene 3.0.2.

Original comment by mbis...@gmail.com on 15 Jul 2010 at 1:07

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the patch! Please tell me if you think I should commit it now.
Otherwise I will wait a bit longer (probably for the next release).

Original comment by thomas.t...@gmail.com on 26 Jul 2010 at 6:18

GoogleCodeExporter commented 9 years ago
There are two issues to consider:

1. Lucene 2.9 supports JDK 1.4+, while Lucene 3 requires 1.5+. If H2 supports 
1.4 (I think it does) then moving to Lucene 3 would require omitting FTL 
support in 1.4 builds.

2. Lucene 2.9 indexes are not compatible with 3.0 indexes, so they would have 
to be rebuilt.  One can connect to 2.9 indexes, but that would be only 
advisable to support backwards compatibility. It would be transitional, driven 
by a flag indicated what version of lucene created the index.

So, if you want to add it in now, you will need to strip it out of 1.4 builds 
and provide a "version" test or flag to indicate what version of index to use 
for Lucene.

If you want to wait for the next release, you may have the exact same problem 
set.  I don't think I can decide for you. In my personal case, I would rather 
see it applied now as H2 1.2 is the current development track.  I would like to 
get it in now to flush out issues with the community at large, as well as 
provide the improved scalability that the patch offers.

Maybe put it to a vote? It would help get people using the google/p/ pages more 
for bugs and comments...

Original comment by mbis...@gmail.com on 27 Jul 2010 at 9:14

GoogleCodeExporter commented 9 years ago
My vote is for backward compatibility with Lucene 2.9!

Original comment by victor.p...@gmail.com on 29 Jul 2010 at 7:44

GoogleCodeExporter commented 9 years ago
It sounds like the best we could do is support both. What about
class FullTextLucene
class FullTextLucene3 extends FullTextLucene

Original comment by thomas.t...@gmail.com on 31 Jul 2010 at 9:16

GoogleCodeExporter commented 9 years ago
It looks good.

Original comment by victor.p...@gmail.com on 2 Aug 2010 at 4:53

GoogleCodeExporter commented 9 years ago
The only use case I can think of to require 2.9 support is if an application is 
reading the lucene index files on disk directly. I have an application that 
does this myself, but I don't think it makes sense for H2 to support it with 
different versions of FTL long-term.

I'd wait for H2 1.3 to get started and put the Lucene3 patch in then so it will 
have time to ride the development effort as an early-adopter upgrade.  It will 
give people who use the lucene index files time to update their apps.

Original comment by mbis...@gmail.com on 4 Aug 2010 at 4:09

GoogleCodeExporter commented 9 years ago
Hi,

I'm currently trying to integrate the patch. What I found so far is:

In the test case, you use System.out.println("sleeping"); Thread.sleep(10000); 
- is it really required to sleep for 10 seconds? I would like to avoid that 
(the automated tests already take a long time...). What about only run for 10 
seconds in the 'big' mode (which is only ran once).

I will try to support both Lucene 2.x and 3.x is possible; if this results in 
build problems I will try to find another solution (maybe a new build target 
that switches the code to Lucene 3.0; not sure yet).

Original comment by thomas.t...@gmail.com on 17 Aug 2010 at 7:21

GoogleCodeExporter commented 9 years ago
Hi,

I think it will be hard to support both at the same time: IndexModifier and 
Hits seems to be no longer supported. I will probably have to use a compile 
time switch (similar to supporting multiple Java versions). Within build.xml I 
will currently only support Lucene 2.2. Switching the source (and compiling 
for) Lucene 3.x will only work using build.sh / build.bat (unless you have a 
simple way to support it within build.xml).

Original comment by thomas.t...@gmail.com on 17 Aug 2010 at 7:31

GoogleCodeExporter commented 9 years ago
I guess I would come up with a good reason to support both at this time. As I 
said in comment 23, the only reason for explicitly knowing which version of 
lucene was being used was to access the index files directly, which isn't 
something that should be encouraged.  I strongly suggest moving to Lucene 3 and 
dropping Lucene 2 rather than trying to support both.

Original comment by mbis...@gmail.com on 17 Aug 2010 at 9:50

GoogleCodeExporter commented 9 years ago
I have now applied the patch (thanks a lot!), changed a few thing, and 
committed it the trunk.

From the change log:

Lucene 3.x support was added in the source code, however it is not yet enabled 
by default and is not yet supported when using the default h2 jar file. To 
enable Lucene 3.x support, the source code of H2 needs to be switched using 
./build.sh -Dlucene=3 switchSource, and then re-compile. To switch the source 
code back use ./build.sh -Dlucene=2 switchSource (replace ./build.sh with 
build.bat on Windows). The plan is to use Lucene 3 by default in H2 version 
1.3.x.

Original comment by thomas.t...@gmail.com on 18 Aug 2010 at 5:29

GoogleCodeExporter commented 9 years ago
Support for Lucene 2.x will still be available in the in H2 1.3.x source code, 
just that it will not be enabled by default (you need to switch the source 
code).

> If H2 supports 1.4 (I think it does)

Not directly, but using Retroweaver is supported.

> the Lucene FT passes this updated test, while the native FT does not.

I'm sorry I didn't see this before. The problem is fixed now.

Original comment by thomas.t...@gmail.com on 18 Aug 2010 at 5:34

GoogleCodeExporter commented 9 years ago
Implemented in version 1.2.141 (but not enabled by default)

Original comment by thomas.t...@gmail.com on 23 Aug 2010 at 12:46

GoogleCodeExporter commented 9 years ago
Enabled in version 1.3.x

Original comment by thomas.t...@gmail.com on 21 Nov 2010 at 5:52

GoogleCodeExporter commented 9 years ago

Original comment by thomas.t...@gmail.com on 12 Dec 2010 at 12:02