Question: why that code?

jods4 commented 4 years ago

@mookid8000 any idea what this loop is for? I'm tempted to remove it. https://github.com/rebus-org/Rebus.Oracle/blob/cfec8d6da1db22bd41e9db8bc96f356e1e0ae0a5/Rebus.Oracle/Oracle/Transport/OracleTransport.cs#L196

A SQL DELETE statement will delete all expired rows... why do a second pass immediately after??

If there's a good reason, I'll keep it and add a comment.

jods4 commented 4 years ago

@mookid8000 Another question: When sending a message, headers are cloned before serialization. Apparently that's because header rbs2-deferred-until is removed when it's present (it's persisted in a distinct visible column so that it can be included in queries).

Would there be any adverse effect in leaving that header in?

I could get rid of the defensive dictionary cloning.
It even seems more "correct" to me that when reading a message you see the same headers that you set when sending it? Consider what happens if you're just forwarding this message?

mookid8000 commented 4 years ago

Hmm I think this particular bit must have changed a little bit, when it was ported from the MSSQL implementation.

To avoid locking more than a row at a time, the MSSQL cleanup implementation deletes a single row at a time, and then it just keeps going until it finds that there are no more rows to delete.

I don't know how locks work with Oracle. Wouldn't a delete statement, that could potentially delete all rows of a table need to grab a table lock to do its thing?

mookid8000 commented 4 years ago

Regarding the headers: In some cases, you might get away with passing the dictionary by reference, but somehow I often feel it's more appropriate to clone it.

There's also cases where a clone is necessary, e.g. when compression is enabled – in that case, a transport message with a GZIPped payload will carry the rbs2-content-encoding with the value gzip, but once it's decompressed the header is removed.

jods4 commented 4 years ago

RE: expired delete loop

That makes sense with MSSQL implementation.

I don't want to change the behavior of current version (full delete). With this implementation the loop doesn't make sense, it's one useless extra round-trip to DB
👉 I'm removing the loop.

To answer your question: Oracle never performs lock escalation. So it takes row locks when deleting, and it sticks to row lock even if we end up deleting 90% of the table. It's all about trade-offs of course: it's probably less efficient, but it's less likely to randomly deadlock.

IMHO the worse offender in the Oracle query is the lack of READPAST equivalent.

BTW, isn't the SQL Server version doing 1 row at a time excessive? Under conditions where a lot of messages have expired that would be a lot of (very slow) roundtrips to DB? Shouldn't you bump it up a bit? Maybe 20?

RE: dictionary cloning

I agree it's important to not mutate the original dictionary. But cloning dictionaries is not cheap if you want to scale, so I wouldn't do it on every messages going in or out if it's avoidable.

Keeping rbs2-deferred-until is not harmful (in fact it's probably better, e.g. if the message is then forwarded by receiver the information should not be lost). So I'm not mutating the headers anymore. 👉 I'm removing the headers cloning.

I see compression changes the headers (obviously) and so it clones them, again. Actually it does it in both direction, even though I'm not sure why it's needed when deserializing. Until headers have been published you should be able to mutate them?

BTW I wonder if the design couldn't be more lightweight. Using ImmutableDictionary as storage might be cheaper -- but a breaking change, since current contract exposes implementation. Also would need a benchmark to assess efficiency.

Another idea could be to have a smaller "modifications" dictionary internally during the send pipeline (for few values, something like SortedList<K,V> is a lot more efficient than Dictionary<K,V>). So when looking up headers or enumerating them for serialization, you would first look up the "modifications" then fallback to the real headers. That structure would keep the public contract and highly reduce allocations and copying. You only allocate for modifications, not the total number of headers. No change -> 0 allocation.

jods4 commented 4 years ago

Because you answered my questions and those changes have been made in my next PR, I'm closing this.

jods4 commented 4 years ago

@mookid8000 Just sayin', I noticed today that this headers cloning thing is way out of hand.

I was looking at the JSON serializer today for an unrelated reason... and I noticed it clones headers, both when serializing and deserializing.

So... If you send a message with JSON serializer, compressed, into an Oracle DB; headers Dictionary is cloned at least 3 times in the process.

Here's a simple idea that is compatible with current api:

When sending: perform one headers clone for the pipeline. Original headers are kept intact, code involved in sending (i.e. serializer, transport, compression, encryption, etc.) can mutate the headers as much as they want.
When receiving: no reason the pipeline shouldn't be allowed to mutate headers so long as the message is not handed to the consuming app.

No perfect because Dictionary is not the most lightweight solution here, but it would already cuts down allocations and copies by a fair bit.

mookid8000 commented 4 years ago

Yeah well.... there's probably lots of places where the code could be optimized further in Rebus 😁

If you're interested in taking a stab at it and taking a few measurements along the way, Rebus has the TestDispatchPerformance test, which basically queues up lots of messages and then measures the time it takes from the bus is started until they have all been received.

It would be interesting to see some before/after measurements.

rebus-org / Rebus.Oracle

Question: why that code? #23

RE: expired delete loop

RE: dictionary cloning