jovans2 / MXFaaS_Artifact

MIT License
26 stars 7 forks source link

Alignment issue about the CPU affinity? #12

Closed HigashikataZhangsuke closed 6 months ago

HigashikataZhangsuke commented 6 months ago

Hi Jovan,

Thanks for open-source this great work! When I was reading the code of functions, I noticed that: In the yaml file, we set the limit of each MXcontainer to be 2. But in the "runner.py", the CPU affinity is set to be (0,15). Given that we do not modify the affinity set(since in "run-all.py", the requests sent to the system does not contain "numcore" param, and you tested MxFaas on multiple 24 cores VMs, I do think there could be an alignment issue about the CPU affinity.

Here's one example: the k8s allocate CPU 0 and 22 to this container, but the container can only seen CPU 0-15. Won't this cause some potential issue?

Thanks for any possible calrification for better understanding MXFaaS! Yinzhe

jovans2 commented 6 months ago

Hi Yinzhe,

Thank you for using our artifact! Let me understand your question:

1) runner is the function runtime that contains dispatcher logic, and spawns new handlers for new requests on the available cores (e.g., https://github.com/jovans2/MXFaaS_Artifact/blob/main/KNative_prototype/cnn_serving/runner.py) 2) node controller is monitoring the latencies of functions within a node and sets the correct per-function CPU assignments (https://github.com/jovans2/MXFaaS_Artifact/blob/main/KNative_prototype/nodeController.py)

You need to start the node controller before starting the experiments, it is a component of MXFaaS system. The node controller will periodically query the available functions and reassign their CPU assignments when needed.

Let me know if this helps or if there are still some unresolved issues.

HigashikataZhangsuke commented 6 months ago

Ahh, I see; so basically, even if we wanna do a single-node experiment, we should start the node controller first. At first I just though it is a component that only need to be started for multiple node experiment.

BTW Jovan, do you know if it is possible to register some other functions to MXFaaS, like PyAE code-decode application in the FunctionBench by ourselves? I think it should be fine right? We just need to copy runner.py and set our own docker image/blob storage for it? And does MXFaaS support local data storage? Like store data is disk/even directly in the docker file for testing? I'm not familiar with using blob so just wanna start some local simple test first.

Again, thanks for your kind help and reply!

jovans2 commented 6 months ago

Correct, the node controller is in charge of core assignment on a node level, so, it’s needed even for a single node experiments.

Adding new benchmarks to our system is fairly easy, as you said just change the app.py with your function and create a docker container for the function.

You can use other storage services, then, you would need to change the implementation of dnld_blob.py (e.g., https://github.com/jovans2/MXFaaS_Artifact/blob/main/KNative_prototype/cnn_serving/dnld_blob.py)

Let me know if you need some help porting new benchmarks.

HigashikataZhangsuke commented 6 months ago

You can use other storage services, then, you would need to change the implementation of dnld_blob.py -> But still I shall not use local storage here right? We are supposed to use remote storage services like S3 bucket, .etc?

Also Jovan I have a follow up question about the node controller: If I have different NUMA nodes in the VM, how do you control the binding problem? Like still we could use cpu affinity mask here, but this time we need to bind corresponding memory to the right cores(otherwise it may harm MXFaaS's performance)

jovans2 commented 6 months ago

You can use local storage, you will just implement your interface to operate with files and provide the data accordingly. However, the performance benefits of MXFaaS would be lower as your idle time goes down.

Correct, you would need to deal with memory allocation as in general NUMA-servers.

HigashikataZhangsuke commented 6 months ago

Okk, thanks for help. Here's the last questions, and I will mark this issue as complete after solving them:
1.In yamls like cnn_serve.yaml (https://github.com/jovans2/MXFaaS_Artifact/blob/main/KNative_prototype/cnn_serving/cnn_serving.yaml), I noticed there is a limit of 2 for CPU. But it looks like each MXFaaS's MXcontainer could use all CPUs in one VM. Jovan could you please help me out? I'm a little bit confused here.

If you think it's hard to explain maybe we can start from an simple example. Say that we only have one VM which has 8 cores(16vCPU), and we only run cnn_serving and vid_processing two applications. Then, we only have one cnn MXcontainer and one vid MXcontainer on this VM right? And after that, if we want to scale up cnn application, what will happen here? like the cnn MXcontainer could run cnn function on all 16 cores? Won't this conflict with the 2 hard limit in yaml file?

2.Would you mind tell me how could we test serverless applications? Like the Train ticketing system applications you used in the original paper?

Many thanks!