lima-vm / lima

Linux virtual machines, with a focus on running containers
https://lima-vm.io/
Apache License 2.0
15.3k stars 601 forks source link

[v0.14.0-beta.0 regression] `limactl ls` takes 67ms even when there is no instance (v0.13 only took 11ms) #1209

Closed AkihiroSuda closed 1 year ago

AkihiroSuda commented 1 year ago
$ git checkout v0.13.0 && make && ntimes -n 100 ./_output/bin/limactl ls
...
real average: 11.479705ms, max: 199.824274ms, min: 8.735928ms, std dev: 19.041288ms
$ git checkout v0.14.0-beta.0 && make && ntimes -n 100 ./_output/bin/limactl ls
...
real average: 66.930248ms, max: 320.575082ms, min: 60.808298ms, std dev: 25.759197ms
afbjorklund commented 1 year ago

This could probably need some go profiling, seems to be about the yaml parsing ?

(pprof) top25
Showing nodes accounting for 300ms, 100% of 300ms total
Showing top 25 nodes out of 122
      flat  flat%   sum%        cum   cum%
      30ms 10.00% 10.00%       40ms 13.33%  runtime.slicerunetostring
      30ms 10.00% 20.00%       30ms 10.00%  runtime/internal/syscall.Syscall6
      20ms  6.67% 26.67%       20ms  6.67%  runtime.findObject
      20ms  6.67% 33.33%       30ms 10.00%  runtime.newobject
      20ms  6.67% 40.00%       20ms  6.67%  runtime.scanblock
      20ms  6.67% 46.67%       20ms  6.67%  runtime.scanobject
      20ms  6.67% 53.33%       20ms  6.67%  runtime.stringtoslicerune
      10ms  3.33% 56.67%       10ms  3.33%  aeshashbody
      10ms  3.33% 60.00%       10ms  3.33%  github.com/goccy/go-yaml/parser.(*context).currentToken (inline)
      10ms  3.33% 63.33%       10ms  3.33%  github.com/goccy/go-yaml/parser.containsPathSpecialChar (inline)
      10ms  3.33% 66.67%       10ms  3.33%  github.com/goccy/go-yaml/token.MappingValue (inline)
      10ms  3.33% 70.00%       10ms  3.33%  reflect.StructTag.Lookup
      10ms  3.33% 73.33%       10ms  3.33%  runtime.(*sweepLocked).sweep
      10ms  3.33% 76.67%       10ms  3.33%  runtime.encoderune
      10ms  3.33% 80.00%       10ms  3.33%  runtime.getitab
      10ms  3.33% 83.33%       20ms  6.67%  runtime.growslice
      10ms  3.33% 86.67%       10ms  3.33%  runtime.heapBitsSetType
      10ms  3.33% 90.00%       40ms 13.33%  runtime.mallocgc
      10ms  3.33% 93.33%       10ms  3.33%  runtime.mapaccess1_faststr
      10ms  3.33% 96.67%       10ms  3.33%  runtime.memclrNoHeapPointers
      10ms  3.33%   100%       10ms  3.33%  sync.(*Pool).Put
         0     0%   100%      200ms 66.67%  github.com/goccy/go-yaml.(*Decoder).DecodeContext
         0     0%   100%       10ms  3.33%  github.com/goccy/go-yaml.(*Decoder).canDecodeByUnmarshaler
         0     0%   100%       40ms 13.33%  github.com/goccy/go-yaml.(*Decoder).createDecodedNewValue
         0     0%   100%       50ms 16.67%  github.com/goccy/go-yaml.(*Decoder).decode

I just looped listAction 100 times here, with debug symbols (and pprof) enabled.


EDIT: This was for a list with a "default" instance, missed that it was the empty list...

afbjorklund commented 1 year ago

If anyone else is searching: go install github.com/AkihiroSuda/ntimes@latest


For an empty configuration on master, I get (on Ubuntu):

WARN[0000] No instance found. Run `limactl start` to create an instance. 
NAME    STATUS    SSH    VMTYPE    ARCH    CPUS    MEMORY    DISK    DIR

real average: 6.12614ms, max: 10.975207ms, min: 5.653739ms, std dev: 653.633µs

balajiv113 commented 1 year ago

Since this happens even with instances being empty i doubt if its related to yaml parsing.

Also i just did a test before and after this commit (https://github.com/lima-vm/lima/commit/26793268c098b444fee50fefcaae0b495beb0a30)

Only after this commit the slowness started. Need to see if some dependency / native cgo things causing this issue.

before this commit i get (Empty instances) real average: 16.691812ms, max: 503.812331ms, min: 10.183544ms, std dev: 49.208706ms

after this commit i get (Empty instances) real average: 114.354338ms, max: 1.030820681s, min: 93.570049ms, std dev: 93.747278ms

afbjorklund commented 1 year ago

Also i just did a test before and after this commit (https://github.com/lima-vm/lima/commit/26793268c098b444fee50fefcaae0b495beb0a30)

That could explain why I don't see any difference on Linux

afbjorklund commented 1 year ago

Probably this means that only the hostagent should be allowed to talk to the virtualization drivers ?

This would be slightly awkward for qemu, but might be needed for the "heavier" drivers such as vz

EDIT: Forgot that the hostagent is not a standalone program here

It was in docker-machine that the drivers are running over gRPC...

balajiv113 commented 1 year ago

Found the cause, The way the commands gets executed is via cmd/main.go.

In this we have all the command things including hostagent. Hostagent is the culprit, As this will internally bootstrap drivers & as well load those cgo stuffs of vz driver i believe

If i remove hostagent references from the cmd directory (delete start.go, edit.go, hostagent.go) then its back to older stats

afbjorklund commented 1 year ago

This was actually one of the reasons why machine drivers like virtualbox and qemu were "internal" (they used CLI) And why drivers like hyperkit and libvirt were "external", as in external programs (they used API, and linked libraries)

For lima, it seems like we need the cmd/limactl/hostagent.go moved over to cmd/lima-hostagent/... This way it would pay the startup cost (like the CLI) once, but be able to communicate afterwards (like the API)

limactl hostagent -> lima-hostagent

https://github.com/lima-vm/lima/blob/v0.14.0-beta.0/pkg/start/start.go#L103

AkihiroSuda commented 1 year ago

Only takes 20ms with:

Still slower than v0.13 though, due to gopacket

init github.com/google/gopacket/layers @4.6 ms, 3.9 ms clock, 393944 bytes, 110 allocs