gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
588 stars 193 forks source link

Scikit-learn benchmark takes very long time to finish (~18 hrs) when EDMM is enabled #1612

Closed vasanth-intel closed 1 week ago

vasanth-intel commented 11 months ago

Description of the problem

When EDMM is enabled, scikit-learn benchmark takes about 18 hours to finish 1 iteration with Gramine-SGX. Otherwise, when EDMM is disabled the benchmark executes successfully for 10 iterations in ~20 hours with Linux Native, Gramine-Direct and Gramine-SGX execution modes.

Steps to reproduce

1) Git clone the benchmark from https://github.com/IntelPython/scikit-learn_bench.

2) Install the benchmark using pip install command using requirements-common.txt and sklearn_bench/requirements.txt present within the above github link.

OR

Save the below lines in requirements.txt and install using the command pip install -r requirements.txt.

tqdm numpy==1.24.1 scipy==1.10.0 daal==2023.0.1 daal4py==2023.0.1 pandas==1.5.2 scikit-learn==1.2.0 dpcpp-cpp-rt==2023.0.0 scikit-learn-intelex==2023.0.1

3) Edit configs/skl_config.json to include kmeans and knn_clsf algorithms only.

4) Update the manifest to enable EDMM and generate the relevant SGX manifest.

5) Create a new results directory for the benchmark output.

6) Execute the below benchmark command.

gramine-sgx sklearnex runner.py --configs configs/skl_config.json --output-file results/sgx_output.json

Expected results

When EDMM is enabled and with Gramine-SGX execution mode, the benchmark should complete it's 10 iterations of execution within ~20 hours.

Actual results

When EDMM is enabled and with Gramine-SGX execution mode, the benchmark takes ~18 hours for 1 iteration to complete.

Gramine commit hash

master

dimakuv commented 11 months ago

@anjalirai-intel Have you tried the perf optimizations for EDMM? In particular, PR https://github.com/gramineproject/gramine/pull/1513

vasanth-intel commented 11 months ago

@dimakuv The above issue was first tested and observed on PR #1513. Later on, it was tested with master and Gramine v1.5 with only sgx.edmm_enable flag set to true. Hence, @kailun-qin suggested to track the issue with master as the issue is observed there as well.

dimakuv commented 11 months ago

@vasanth-intel So PR #1513 doesn't help, right? The performance overhead is still huge?

vasanth-intel commented 11 months ago

With scikit-learn workload, we were unable to conclude on the performance overhead as we were unable to execute the workload for 10 iterations when sgx.edmm_enable is set to true. This is true for PR #1513, master and Gramine v1.5. We were able to execute only for 1 iteration which took ~18 hours, which is why the issue was raised.

monavij commented 11 months ago

I wonder if this workload does a lot of dynamic allocation AND deallocation. Maybe we will need "lazy free" optimization as well.

dimakuv commented 1 week ago

This issue is almost 1 year old. @vasanth-intel If this is still reproduced on latest Gramine, and we need to think about fixing it, please reopen the issue and add new data.