richardtallent / RT.Comb

Creating sequential GUIDs in C# for MSSQL or PostgreSql
203 stars 35 forks source link

Querying records via the COMB timestamp #27

Open GFoley83 opened 1 year ago

GFoley83 commented 1 year ago

Hi there. I've been using your library for years, so a very big thank you.

I was reading a comment you made here, regarding querying via the timestamp contained in the COMB:

https://github.com/richardtallent/RT.Comb/issues/10#issuecomment-453724954

Even though I used COMBs on pretty much all tables these days, I also have a date/time of insertion field, because the COMB's timestamp value is unwieldy to query directly.

Which made me question a couple of things:

  1. I understand that there is a chance of collision with regards to the timestamp portion of a COMB e.g. if many are generated and inserted at around the same time; is this the only factor that make querying via the timestamp "unwieldy" or are there other factors?
  2. Can you recommend a optimum/performant way of querying DB records via a COMB timestamp, or range? e.g. all records between timestamp X and Y, from a COMB?
  3. If you recommend using a separate date/time insertion field, as well as a COMB, is there any real value to being able to extract the timestamp from the generated COMB? Other than for testing purposes. In my mind, half the point of using a COMB was so that I could avoid having to store a separate date time value in the table (rightly or wrongly).

Thank you!

richardtallent commented 1 year ago
  1. No, because a normal timestamp field can also easily have duplicates when inserting multiple records. The reason querying using the embedded timestamp is that it requires some gnarly SQL to create a surrogate GUID for the start and end date in your query.
  2. Yes, as above, you can do so efficiently (but again, with some weird code) by constructing a COMB (using the .NET code here or T-SQL) for your start date that is all 0's and the starting timestamp, and constucting an end date using a COMB with all F's and the timestamp. Then you just use tbl.comb BETWEEN @startComb AND @endComb or tbl.comb > @startComb or whatever. If you're sorting on timestamp rather than filtering, you can sort by the COMB field instead to improve performance, since the COMB is likely your clustered key and timestamp isn't.
  3. Not really. It's a tiny side benefit, but really, it's purpose is to provide a high water mark value that is the same (ish) on any database server so you can safely create always-increasing GUIDs. This is because a normal (random) GUID, when inserted, would insert somewhere in the middle of your table, which is more likely to cause page splits. I don't recommend treating the COMB as a timestamp. I have, however, used it as a source for populating a new timestamp field in a few situations where I dumbly didn't bother to create an insertion timestamp field ("I'll never need to know when records are added") and later wanted to add one.