phaistos-networks / TANK

A very high performance distributed log service
Apache License 2.0
940 stars 70 forks source link

Consume based on timestamp #4

Closed markpapadakis closed 6 years ago

markpapadakis commented 8 years ago

Our ops folk suggested it 'd be nice to be able to consume starting from a specific timestamp(each message header contains the creation timestamp in milliseconds), in addition to consuming by absolute sequence number.

It should be easy to implement, and would perhaps help with all kinds of problems. We just need to maintain another index, and this should be enabled on a per topic or n a per topic/partition basis.

Kafka doesn't support this feature.

markpapadakis commented 8 years ago

We could start by using the timestamp to begin streaming from as a hint. All log files (including the current (mutable) log file, and any immutable ilog files) encode the timestamp they were created in the filename, so that Tank always knows when each of the logs it keeps track of for ever given partition was created.

If a client requests to begin streaming from timestamp T, all we need to do is use binary search e.g (std::lower_bound()) to determine which log to begin streaming from. That is to say, the offset will be considered the timestamp instead, and in that case, the client is expected to first request by that timestamp, and subsequent calls should use the next.seqNum until they get to the timestamp desired.

Depending on the rolling strategy(how often Tank switched to a different log), this may be quite useful. e.g if we are doing this once a day, then we should be able to stream starting from a day's first message, and so on.

markpapadakis commented 8 years ago

This has been merged on Kafka. See KIP-79 for rationale and how this works.

This shouldn't be hard to implement either.

markpapadakis commented 8 years ago

See also: http://www.confluent.io/blog/announcing-apache-kafka-0-10-1-0/

markpapadakis commented 6 years ago

This is now implemented for tank-cli get -T. We didn't need to maintain separate indices for checkpointing timestamps to sequence numbers. We instead determine the base sequence number using binary search in a few milliseconds and that means we don't need to track more indices or otherwise incur extra overhead for fast access by timestamp ops which are likely infrequently needed anyway.

Idea by @rkrambovitis and @gabrieltz