rajveerb / lotus

Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Other
3 stars 1 forks source link

Symlink files cached over remote filesystem #4

Closed rajveerb closed 1 year ago

rajveerb commented 1 year ago

Given a file in a remote filesystem, check if its content are cached after accessing it once.

Needs to be checked in context of C4130 node in cloudlab using a long term dataset.

rajveerb commented 1 year ago

The symlink files get cached in memory which leads to inaccurate E2E VTune profiling because the entire dataset is symlink for synthetic dataset in the paper.

If the goal is to only profile preprocessing then the symlink option is great because I/O related CPU time will not be accounted in profiling for fetching from storage into main memory.

Used vmtouch to check if a file is cached in memory.