[Core] Importing `sky` and `sky.status(refresh=True)` takes about 65MB / 200MB memory

skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

https://skypilot.readthedocs.io

Apache License 2.0

6.81k stars 513 forks source link

[Core] Importing `sky` and `sky.status(refresh=True)` takes about 65MB / 200MB memory #4334

Open Michaelvll opened 1 week ago

Michaelvll commented 1 week ago

This affects both the unmanaged jobs on clusters and the managed jobs (each controller process has a ray driver, which consumes a lot of memory). It is also part of the reason why SkyPilot cannot run a large number of parallel-managed jobs.

To reproduce:

import psutil

# Get the memory consumption of the current process
print('Before import sky:', psutil.Process().memory_info().rss / 1024 / 1024, 'MB')

import sky

print('After import sky:', psutil.Process().memory_info().rss / 1024 / 1024, 'MB')

sky.status(refresh=True)

print('After status:', psutil.Process().memory_info().rss / 1024 / 1024, 'MB')

Before import sky: 11.6015625 MB
After import sky: 77.85546875 MB
After status: 200.34765625 MB

romilbhardwaj commented 1 week ago

FWIW, I got slightly less drastic results on my mac:

Before import sky: 16.5625 MB
After import sky: 79.296875 MB
After status: 112.0 MB

Trying to gauge the headroom for improvement, I tried replacing sky with pandas:

Before import pandas: 18.8472 MB
After import pandas: 82.95126 MB

Michaelvll commented 1 week ago

FWIW, I got slightly less drastic results on my mac:
Before import sky: 16.5625 MB
After import sky: 79.296875 MB
After status: 112.0 MB
Trying to gauge the headroom for improvement, I tried replacing sky with pandas:
Before import pandas: 18.8472 MB
After import pandas: 82.95126 MB

It might relates to how many clusters we have in the status table and which clouds are those clusters from; more clouds means more catalog to be loaded in memory

Michaelvll commented 1 week ago

If the catalog is actually a main cause of the memory consumption, we can make it a daemon service running in background and let the other skypilot process communicate with it using RESTful API or grpc. ; )