sylab / cacheus

The design and algorithms used in Cacheus are described in this USENIX FAST'21 paper and talk video: https://www.usenix.org/conference/fast21/presentation/rodriguez
59 stars 17 forks source link

Questions on trace data #11

Closed brucechin closed 1 year ago

brucechin commented 2 years ago

Hello, I read your FAST21 paper and found this repo. I was trying to reproduce the experiments. May I ask if you can share the download script for the dataset? Actually, I did not find dataset with name "CloudPhysics", "CloudVPS" and "CloudCache". This prevents me from running your codebase. Thanks for your help!

I found the MSR Cambridge Traces in http://iotta.snia.org/historical_section?tracetype=block-io, i am not sure if this is the MSR dataset.

I found this link http://iotta.snia.org/traces/block-io/414 in a closed issue. But still I do not know where to find the "CloudPhysics", "CloudVPS" and "CloudCache" dataset.

I can run the cacheus algorithm on FIU/home1 dataset. I am trying to test it on more FIU sub-datasets. A good paper!

lia54 commented 2 years ago

Hi @brucechin, Thank you for your interest in our paper!

Please find the datasets and the corresponding links/locations (when possible):

  1. FIU SRC_Map (All traces are a one-day duration for each file).
  2. MSR Cambridge (These traces are a one-week duration for each file. For the paper, we extracted the first day only based on the timestamp).
  3. CloudVPS (These traces are a one-day duration for each file).
  4. CloudCache is a collection of the webserver and moodle traces that were used in the CloudCache paper from FAST'16. (All one-day duration).
  5. CloudPhysics are non-public traces used in the SHARDS paper from FAST'15 that were shared directly from the authors.
Wkkkkk commented 2 years ago

Hi, I'm running the experiment as well but find it hard to reproduce the access pattern in this figure. Screenshot from 2022-05-24 17-59-39

As far as I can see, the downloaded trace files are in this format: [ts in ns] [pid] [process] [lba] [size in 512 Bytes blocks] [Write or Read] [major device number] [minor device number] [MD5 per 4096 Bytes] Some sample data from webmail day 16 (webmail.cs.fiu.edu-110108-113008.16.blkparse) would like 1299602115063356 318 kjournald 220040 8 W 2 0 d186315aa2d4c75b753fe9ca98187cdd

To draw the same figure, we have the timestamp (first column) and the logic block address (fourth column). However, if we draw a scatter plot over the time and block address, the resulting figure would look like this: access patern Very likely I misunderstood the dataset and drew it in the wrong way. It'd be very kind of you if you could explain a bit how to generate that figure. Thank you very much!

zy1024cs commented 2 years ago

Hi, I'm running the experiment as well but find it hard to reproduce the access pattern in this figure. Screenshot from 2022-05-24 17-59-39

As far as I can see, the downloaded trace files are in this format: [ts in ns] [pid] [process] [lba] [size in 512 Bytes blocks] [Write or Read] [major device number] [minor device number] [MD5 per 4096 Bytes] Some sample data from webmail day 16 (webmail.cs.fiu.edu-110108-113008.16.blkparse) would like 1299602115063356 318 kjournald 220040 8 W 2 0 d186315aa2d4c75b753fe9ca98187cdd

To draw the same figure, we have the timestamp (first column) and the logic block address (fourth column). However, if we draw a scatter plot over the time and block address, the resulting figure would look like this: access patern Very likely I misunderstood the dataset and drew it in the wrong way. It'd be very kind of you if you could explain a bit how to generate that figure. Thank you very much!

You should narrow the range of the coordinate axis, otherwise you often see a line. Because if you want to have that access mode, you can usually observe it in a small range.

lia54 commented 2 years ago

Hi @Wkkkkk thanks for the interest. For the purpose of building the access pattern, we mapped each lba value of the trace to a unique number to reduce the range on the y-axis of the plot. That's probably why you're seeing a different graph, since you're directly plotting the lba from the trace. The x-axis is the timestamp converted from nanoseconds to hours.

Wkkkkk commented 2 years ago

Hi, Thank you both for the quick reply! Sure I'd like to try the mapping to narrow down the range in the y-axis. Would it be possible for you to share a bit about how you did the mapping? Your paper is quite interesting and your insights about the access pattern mean a lot!

Wkkkkk commented 2 years ago

The latest plot if we just draw the ones blow 140K. Screenshot from 2022-05-25 09-25-58

zy1024cs commented 2 years ago

The file types of CloudVPS seem to be the ".blktrace.1". Please ask me how I should convert them, after all the contents seem impossible to handle QQ截图20220526145142. includi QQ截图20220526151541 ngCloudCache

jzx-bitdb commented 2 years ago

Hi, Thank you both for the quick reply! Sure I'd like to try the mapping to narrow down the range in the y-axis. Would it be possible for you to share a bit about how you did the mapping? Your paper is quite interesting and your insights about the access pattern mean a lot!

I have the same question about how to range in the y-aixs. Have you solved this problem?

jzx-bitdb commented 2 years ago

Hi @Wkkkkk thanks for the interest. For the purpose of building the access pattern, we mapped each lba value of the trace to a unique number to reduce the range on the y-axis of the plot. That's probably why you're seeing a different graph, since you're directly plotting the lba from the trace. The x-axis is the timestamp converted from nanoseconds to hours.

Could you tell me how to map each lba value of the trace. I have two ideas about it. The first one is scale each lba value by the same factor. The second is iterator the request sequence to map them to an increment value. I have the similar picture as yours by the second method. Am I right? And what is the difference between the two methods?

test

Wkkkkk commented 2 years ago

Hi @xuangestallone, Your results seem to be a great example. Would you mind sharing the second method in more detail or the code so I could repeat and try it myself as well? I want to keep experimenting and discussing it with you if possible.

My email is kunwu@kth.se.

zwh272581638 commented 1 year ago

The file types of CloudVPS seem to be the ".blktrace.1". Please ask me how I should convert them, after all the contents seem impossible to handle QQ截图20220526145142. includi QQ截图20220526151541 ngCloudCache

Perhaps you should read the user manual on the fio and blktrace official website

sylab commented 1 year ago

Closing due to inactivity.