Open Michaelvll opened 1 week ago
FWIW, I got slightly less drastic results on my mac:
Before import sky: 16.5625 MB
After import sky: 79.296875 MB
After status: 112.0 MB
Trying to gauge the headroom for improvement, I tried replacing sky
with pandas:
Before import pandas: 18.8472 MB
After import pandas: 82.95126 MB
FWIW, I got slightly less drastic results on my mac:
Before import sky: 16.5625 MB After import sky: 79.296875 MB After status: 112.0 MB
Trying to gauge the headroom for improvement, I tried replacing
sky
with pandas:Before import pandas: 18.8472 MB After import pandas: 82.95126 MB
It might relates to how many clusters we have in the status table and which clouds are those clusters from; more clouds means more catalog to be loaded in memory
If the catalog is actually a main cause of the memory consumption, we can make it a daemon service running in background and let the other skypilot process communicate with it using RESTful API or grpc. ; )
This affects both the unmanaged jobs on clusters and the managed jobs (each controller process has a ray driver, which consumes a lot of memory). It is also part of the reason why SkyPilot cannot run a large number of parallel-managed jobs.
To reproduce: