guangyu-ryan / py-leveldb

Other
0 stars 0 forks source link

Allow comparator name to be specified #21

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I had emailed about something like this earlier - where you have an existing 
ldb and you want to write the comparator in Python. I actually have a case 
where some code I have uses the bytewise comparator but changes the comparator 
name, presumably to detect changes in the comparator.

I'm attaching a patch which allows you to pass comparator_name to the LevelDB 
constructor - it does it by writing a hacky wrapper comparator, but I'm hoping 
this can help get things started. 

Original issue reported on code.google.com by alecflett@google.com on 9 Jul 2012 at 7:10

GoogleCodeExporter commented 9 years ago
attached.

Original comment by alecflett@google.com on 9 Jul 2012 at 7:11

Attachments:

GoogleCodeExporter commented 9 years ago
FWIW, I don't know much about the Python C API, so I'm guessing I'm messing up 
the object ownership here by holding onto the name in the comparator. I'm 
providing this patch merely as a starting point.

Original comment by alecflett@google.com on 9 Jul 2012 at 7:12

GoogleCodeExporter commented 9 years ago
This is something along the lines I was hoping for.

I'll just have the comparator default to "bytewise", perhaps add a custom 
example, and include instructions on how to create more.

Original comment by arnim...@gmail.com on 9 Jul 2012 at 8:06

GoogleCodeExporter commented 9 years ago
Implemented comparator name for opening databases. Will port this to RepairDB() 
as well.

Original comment by arnim...@gmail.com on 10 Jul 2012 at 12:50

GoogleCodeExporter commented 9 years ago
Thanks for starting this - I'm still hoping to be able to specify a comparator 
name, even if I get the bytewise comparator as a result - the problem is 
leveldb will refuse to open the database if the names don't match exactly - I'm 
trying to use pyleveldb to examine a leveldb created by a c++ program, and 
until I fully port the comparator over (it has some gnarly dependencies) I'd 
like to at least let pyleveldb open it (that's what I did with the patch I have 
here)

Original comment by alecflett@chromium.org on 10 Jul 2012 at 8:25

GoogleCodeExporter commented 9 years ago
I may have misunderstood you. You just wanted a way to override the Name() 
function for a given comparator (bytewise in your patch), right?

If you´re into that territory, why not just use the bytewise comparator, and 
disable the name check? Hopefully you´re not interested in mutating the 
database, or getting keys in any sensible order.

And I looked the the comparator from chrome IndexedDB. Good luck with that :)

Original comment by arnim...@gmail.com on 11 Jul 2012 at 12:04

GoogleCodeExporter commented 9 years ago
there's no way to disable the name check in leveldb (as best as I can tell 
anyway) - at least not without patching it.. I'm not mutating the database, 
just doing one big scan.

I could actually write comparator fairly trivially in Python if it were 
possible to actually implement a comparator in pyleveldb .. I know that's a lot 
more work though.

Original comment by alecflett@chromium.org on 11 Jul 2012 at 12:09

GoogleCodeExporter commented 9 years ago
that´s not a bed test case.

there are a couple of implementation details that need to be worked out: 
comparator exception propogation and conditional GIL releasing. nothing too 
serious. you´d just pass a callable object, and a name.

GIL can´t be released for mutation or reading, that´s about the only 
difference.

Original comment by arnim...@gmail.com on 11 Jul 2012 at 12:26

GoogleCodeExporter commented 9 years ago
I´m close to a prototype of this, but have hit a snag.

The Compare() function can be called from pretty much any thread, including the 
compaction thread. Calling the Python callback from there is not a problem.

The problem comes when the Python code raises an exception, and we´re left 
with no comparison result. The obvious solution would be to raise a C++ 
exception, but I'm worried that it would break some invariants, e.g. not 
releasing mutexes.

Returning an arbitrary comparison value, and just logging the error, might 
leave the database in an inconsistent state, so that's not an option.

I'm gonna try the C++ exception solution, but I'm not optimistic that it will 
work.

A solution of last resort would simply be to call abort(), after printing out 
the Python stacktrace.

If anybody has other ideas, I'd very much like to hear them.

Original comment by arnim...@gmail.com on 23 Jul 2012 at 11:30

GoogleCodeExporter commented 9 years ago
I just committed a preliminary version supporting Python based comparators.

Any feedback appreciated.

I haven´t done extensive testing on this, but I´ll close the issue once I´m 
confident the code works as advertised.

Original comment by arnim...@gmail.com on 27 Jul 2012 at 12:03

GoogleCodeExporter commented 9 years ago
great news! I'll try to get to this today. I've rewritten a lot of the 
WebKit/IndexedDB comparator code in Python for other purposes, so it shouldn't 
be too hard to hook up.

Original comment by alecflett@chromium.org on 27 Jul 2012 at 4:56

GoogleCodeExporter commented 9 years ago
ok, finally got to try this out. A few issues:
1) just passing the comparator name, means the code complains that you should 
just pass a comparator name or a tuple. But a tuple works
2) I eventually get a segfault, after lots of successful comparisons - no stack 
trace, just a hard crash. I verified that it wasn't my comparator throwing an 
exception, either.

Original comment by alecflett@chromium.org on 31 Jul 2012 at 5:22

GoogleCodeExporter commented 9 years ago
Interesting. in my particular dataset, it very consistently crashes after 253 
comparisons. (using db.RangeIter())

I sprinkled some printfs around, and the crash seems to be in the releasing of 
the GIL inside PythonComparatorWrapper::Compare()

Original comment by alecflett@chromium.org on 31 Jul 2012 at 5:43

GoogleCodeExporter commented 9 years ago
Alright. I wrote a stress test to verify this, and succeeded in crashing it, 
but it happens more randomly than for you test case.

Original comment by arnim...@gmail.com on 31 Jul 2012 at 11:20

GoogleCodeExporter commented 9 years ago
Think I figured this one out. I'm running a test-suite on a dual core machine, 
but will get the chance to run it on a bigger machine tomrrow. Can you please 
see if can break this?

Original comment by arnim...@gmail.com on 1 Aug 2012 at 12:34

GoogleCodeExporter commented 9 years ago
I just ran my code against this, wasn't able to make it crash at all, hitting a 
ldb with 2800 rows... I'm guessing that was the fix.

Original comment by alecflett@chromium.org on 1 Aug 2012 at 1:29

GoogleCodeExporter commented 9 years ago
FWIW I'm on a 24 core machine... my script just goes through the ldb once 
though, so it's just running for a few seconds.

Original comment by alecflett@chromium.org on 1 Aug 2012 at 1:30

GoogleCodeExporter commented 9 years ago

Original comment by arnim...@gmail.com on 6 Aug 2012 at 11:24