About the L2 filter - Githubissues

jpjy commented 7 months ago

Dear Zirui,

I am a Ph.D. student at the University of South Florida, and your work has greatly impressed me.

I have a specific question regarding the use of the L2 evset to populate the LLC evset, which can significantly reduce the candidates size and is key to implementing Prime+Probe on serverless platforms. I am trying to reproduce your work on a Google Cloud environment (Compute Engine virtual machine with Skylake-SP architecture).

The build of the L2 evset works well. However, I encounter issues when using the L2 evset to populate the LLC candidates. Sometimes the filtered candidates size is too small, leading to the error "Not enough candidates due to filtering."

At other times, the size of the filtered candidates is normal, but they cannot be used successfully to build the LLC/SF evset.

I have tried increasing the cands_scaling and max_tries settings, but this has not resolved the issues.

Without the L2 evset filter, I can build the LLC evset directly based on the initial candidates, but this process is too time-consuming; it takes about 45 minutes to build 896 evsets for a single page offset. Therefore, I believe the L2 evset filter is necessary for the cloud environment.

Could you possibly share some specific details of your implementation or provide guidance on how to resolve these issues? I would genuinely appreciate your time and insights. Thank you in advance.

Best, Changjia

zzrcxb commented 7 months ago

Hi Changjia,

Thank you for reporting this issue.

Indeed, I noticed some instability when running L2 filtering in a virtualized environment, such as Compute Engine. (The paper's experiments were conducted in a Google Cloud Run Gen1 environment, which uses Docker without hardware virtualization.) The symptom I observed was that the L2 hit latency fluctuates significantly, which can confuse the code regarding what constitutes an L2 hit.

A comprehensive exploration of the root cause and improvements to the PoC for VMs are on my radar. However, I currently have limited time and resources to address this. I plan to investigate further in June. Meanwhile, you might try running the code on a local machine or in a Google Cloud Run Gen1 environment.

manitofigh commented 1 month ago

Hello @zzrcxb,

Are there any updates on this issue? I am facing the same problem. I'm running the test on a VM, on top of an Intel SKX server.

Thank you!

Mani

zzrcxb commented 1 month ago

My apologies that I currently don't have enough cycles to work on this.

manitofigh commented 1 month ago

All good! Thank you for the heads up.

manitofigh commented 1 month ago

I am trying to reproduce your work on a Google Cloud environment (Compute Engine virtual machine with Skylake-SP architecture).

Hello @jpjy ,

I see in your initial screenshot, the tool detected 28 slices for your instance (which is correct based on the host's CPU), but I was wondering if you hardcoded that value by setting the NUM_L3_SLICES environment variables e.g., export NUM_L3_SLICES=28?

Thank you for your time!

zzrcxb / LLCFeasible

About the L2 filter #2