Closed robshep closed 11 years ago
Thanks Rob, I'll have a think and get back, feel free to report any feedback btw :)
I will fork and hack this evening on the various options.
I will also put my example code of (negligible workload) producers/consumers in a Gist to empirically evaluate any of the various implementations.
Rob
That's great, Rob.
I'll be on vacation most of the week, but I'll answer to emails and issues.
Happy hacking :)
I've been inspecting the code in an attempt to find the relevant code segments where any of the 3 proposed methods can introduce their respective functionality.
I'm not sure I understand the workings entirely - and have become somewhat confused about various aspects of the current implementation.
I believe (whilst being fully prepared to have my beliefs destroyed) that the code already performs a flush of the "Async-Write-Queue" before marking the existing locations as deleted. (See comment (1) in code below)
I think the sync boolean option for Journal.updateLocations(Location location, byte type, boolean sync)
only enables the physical sync option. (See comment (2) below)
void updateLocation(Location location, byte type, boolean sync) throws CompactedDataFileException, IOException {
Lock threadLock = getOrCreateLock(Thread.currentThread(), location.getDataFileId());
accessorLock.lock();
threadLock.lock();
try {
journal.sync(); // <-- 1. This waits for the async queue to empty?
//
RandomAccessFile raf = getOrCreateRaf(Thread.currentThread(), location.getDataFileId());
if (seekToLocation(raf, location, false)) {
int pointer = raf.readInt();
int size = raf.readInt();
raf.write(type);
IOHelper.skipBytes(raf, size - Journal.HEADER_SIZE);
location.setType(type);
if (sync) {
IOHelper.sync(raf.getFD()); // <-- 2. This is a physical sync, determined by boolean sync
}
} else {
throw new IOException("Cannot find location: " + location);
}
} finally {
threadLock.unlock();
accessorLock.unlock();
}
}
Now, in order to test some theories, I took a look at where the physical sync (IOHelper.sync()
) is used and it appears that it is not enabled by default.
So it seems that delete()
is not "slower" than write()
- as shown by my first encounters - just safer.
Here are some numbers for the delete lag that I _first_ saw.
( Harness code is at: https://gist.github.com/4669110 )
Harness:
The lag between completing the N rounds of 1&2 and subsequently finishing off all the enqueued N rounds of 3 was measured.
A) journal.setPhysicalSync(false)
DEFAULT deletes are phys-sync'ed though
N records: 1000 writers: 1 readers: 1 records size 54 bytes
Time to complete production: 7s Time to complete consumption: 52s Consumption/Production Lag: 45s
then
B) journal.setPhysicalSync(true)
N records: 1000 writers: 1 readers: 1 records size 54 bytes
Time to complete production: 52s Time to complete consumption: 69s Consumption/Production Lag: 16s
furthermore (for raw webscale :) speed) I patched DataFileAccessor.java:70
thusly
- if (sync) {
- IOHelper.sync(raf.getFD());
- }
+ if (journal.isPhysicalSync()) {
+ IOHelper.sync(raf.getFD());
+ }
C) journal.setPhysicalSync(false)
no physical sync waiting at all
N records: 1000 writers: 1 readers: 1 records size 54 bytes
Time to complete production: 0s Time to complete consumption: 0s Consumption/Production Lag: 0s
That circumstance reminds me of this... http://www.xtranormal.com/watch/6995033
Please let me know what your thoughts are on what I have managed to understand.
Yes, you are correct about your two points above, and yes, Journal.IO codebase is not an easy one to grok :)
By the way, it is not a matter of safety: it is a matter of correctness, because unsynced (I mean Journal.sync) locations will not be found and hence not deleted. If you see things under this light, my three suggestions make probably more sense :)
Apologies Sergio, I think I am missing some level of understanding.
I had gathered that the journal.sync()
statement did flush any inflight pending writes before doing any deletions.
Option 1 is already implemented isn't it?
Sorry for any crossed wires.
Thanks though.
Rob
On Sat, Feb 2, 2013 at 6:28 PM, robshep notifications@github.com wrote:
Apologies Sergio, I think I am missing some level of understanding.
No worries, happy to answer :)
I had gathered that the journal.sync() statement did flush any inflight pending writes before doing any deletions.
And that's correct... :)
Option 1 is already implemented isn't it?
Nope, it isn't yet, but that's the one I'm going to implement soon.
Stay tuned ;)
Guys, I've made some delete performance improvements without impacting APIs, basically starting from option 1 above, here's the relevant commit: https://github.com/sbtourist/Journal.IO/commit/ead66d0c9cbc3060f3f320af1846bcf63501ec61
Here are some performance numbers: 1) Deletes per second with no journal physical sync enabled (default): 20k 2) Deletes per second with journal physical sync enabled: 5k
Feedback wanted :)
Sorry for the late response. I was busy on other projects. Thank you for this enhancement.
Running this on my small evaluation harness provides the values:
before
rev: b73ccf8 N records: 50000 writers: 1 readers: 1 records size 104 bytes physical sync: true
Time to complete production: 3076s Time to complete consumption: 3652s Consumption/Production Lag: 576
after
rev: ead66d0 N records: 50000 writers: 1 readers: 1 records size 104 bytes physical sync: true
Time to complete production: 2970s Time to complete consumption: 2970s Consumption/Production Lag: 0
So, the main benefit for my example here - which mirrors my requirements - is that the code for this commit has decreased the overhead for deleting so much that there is now no lag.
Many thanks for this effort Sergio. We are very grateful.
Hi Rob, great to hear! Could you write about that on the mailing list too?
Sergio Bossa Sent by iPhone
Il giorno 19/feb/2013, alle ore 02:30, robshep notifications@github.com ha scritto:
Sorry for the late response. I was busy on other projects. Thank you for this enhancement.
Running this on my small evaluation harness provides the values:
before
rev: b73ccf8 N records: 50000 writers: 1 readers: 1 records size 104 bytes physical sync: true
Time to complete production: 3076s Time to complete consumption: 3652s Consumption/Production Lag: 576
after
rev: ead66d0 N records: 50000 writers: 1 readers: 1 records size 104 bytes physical sync: true
Time to complete production: 2970s Time to complete consumption: 2970s Consumption/Production Lag: 0
So, the main benefit for my example here - which mirrors my requirements - is that the code for this commit has decreased the overhead for deleting so much that there is now no lag.
Many thanks for this effort Sergio. We are very grateful.
— Reply to this email directly or view it on GitHub.
As per discussion here:
https://groups.google.com/forum/?fromgroups=#!topic/journalio/f7Touwno_PE
There are a few options available for adding asynchronicity to the journal delete operations