twitter / cache-trace

A collection of Twitter's anonymized production cache traces.
Creative Commons Attribution 4.0 International
168 stars 33 forks source link

Recommended tools for replay #7

Open ZiyangJiao opened 1 year ago

ZiyangJiao commented 1 year ago

Hi,

Thanks for the exciting work and traces.

I wonder if you can recommend a tool to replay these traces.

Thank you so much for your insights and time!

Jeongseob commented 1 year ago

Hi,

AFAIK, Segcache can replay the traces for evaluating the caching strategies. However, it doesn't seem like the tool takes into account the inter-arrival rate between requests.

I am wondering whether there is a client-side load generator replaying the traces so that we can mimic the real-world scenarios .

Thanks!

cristina-abad commented 1 year ago

We needed this functionality in the past (replay of traces with interarrival information) and implemented a tool based on YCSB for this. The fork is a bit old, but a student was able to use it with a more recent version of YCSB with just some minor changes, so this may work for you too. Our tool is called KV-replay and you can find the code here: https://github.com/disel-espol/KV-replay And the paper is here: https://ieeexplore.ieee.org/document/7923801/

ZiyangJiao commented 1 year ago

We needed this functionality in the past (replay of traces with interarrival information) and implemented a tool based on YCSB for this. The fork is a bit old, but a student was able to use it with a more recent version of YCSB with just some minor changes, so this may work for you too. Our tool is called KV-replay and you can find the code here: https://github.com/disel-espol/KV-replay And the paper is here: https://ieeexplore.ieee.org/document/7923801/

Thanks a lot. I wonder if there is a sample trace file for KV-replay so users can know the format? It seems that the one used in example is not included in the repository:

(workload-replay_example-tracefile.dat).

Thanks!

1a1a11a commented 1 year ago

Hi,

Thanks for the exciting work and traces.

I wonder if you can recommend a tool to replay these traces.

Thank you so much for your insights and time!

The replay tool varies depends on the use case, as @Jeongseob has pointed out, if you plan to evaluate the efficiency of an eviction algorithm, try libCacheSim (https://github.com/1a1a11a/libCacheSim), if you need to evaluate efficiency and throughput, Segcache is the option, if you need to evaluate tail latency or replay the trace in wall clock time (takes weeks to replay), then you can try the tools recommended by @cristina-abad (I haven't used it so cannot comment on it) or write your own.

cristina-abad commented 1 year ago

@ZiyangJiao I believe the paper has all the details you need, but we just uploaded a sample trace in our repo (https://github.com/disel-espol/KV-replay/blob/master/workloads/workload-rpYoutube-012908-withTimestamp-withSizes-1000.dat). As @1a1a11a said, our tool may or may not work for you. Each use case is specific. KVreplay is completely based on YCSB, so I suggest reading about YCSB before trying to use KVreplay. Any further questions about our tool are best addressed in our own repo, not here (or by email to cabadr at espol.edu.ec). We have not used KVreplay with the Twitter traces yet, but it is something we'd like to do in the future and this is why I was subscribed/watching this repo.