virtual-kubelet / node-cli

CLI library for rapid prototyping of virtual-kubelet nodes
Apache License 2.0
18 stars 30 forks source link

start informer factory after informer has been created #8

Closed justnoise closed 4 years ago

justnoise commented 4 years ago

There's a race between calling podInformerFactory.Start() in node-cli/internal/commands/root/root.go and instantiating an informer in the podInformerFactory. If the goroutine that calls podInformerFactory.Start(ctx.Done()) executes before the podInformerFactory has an informer then no informers are started and the PodController's call to cache.WaitForCacheSync(ctx.Done(), pc.podsInformer.Informer().HasSynced) will never return.

An informer is created in the podInformerFactory by calling either podInformer.Informer() or podInformer.Lister(). That happens right after calling go podInformerFactory.Start(ctx.Done()). Typically the creation of the factory happens before the goroutine runs and things run as expected unless you're doing your testing in minikube in a VM on a terribly underpowered laptop...

To reproduce the race, edit node-cli/internal/commands/root/root.go to insert a sleep after starting the informer factories:

    go podInformerFactory.Start(ctx.Done())
    go scmInformerFactory.Start(ctx.Done())
    time.Sleep(5 * time.Second)

Run node-cli with the command line argument --startup-timeout=30s to see the failure.

Inserting time.Sleep() gives the Start() goroutines enough time to start and finish executing before proceeding. In this case, Start() does nothing since we haven't done anything to instantiate an informer in the factory

Alternatively you can also repro the race by not launching the Start() as a goroutine. Start() isn't a blocking operation so there's no need for them to be goroutines anyways.

    podInformerFactory.Start(ctx.Done())
    scmInformerFactory.Start(ctx.Done())

The race is fixed by starting the podInformerFactory after an informer has been created in the factory by calling either podInformer.Informer() or podInformer.Lister(). That happens in the call to manager.NewResourceManager. For consistency I've moved the starts to the same place they happen in the virtual-kubelet repo.

Let me know if you'd like a sample provider and manifest to test this with.