make tfjs WebGL backend cache textures less aggressively

Currently we use WebGL backend for tfjs. Every tf.Tensor under the hood gets represented by a WebGL texture and they are acquired & released by an internal TextureManager class. So TextureManager acts similar to std::allocator.

Turns out that TextureManager leverages a common allocator optimisation - when a tensor is Tensor.dispose()-ed by application code it doesn't deallocate the GPU memory - it puts it in a pool of unused (or "free") textures, so the memory is still not available to the other processes on the user's machine. Then, seemingly, it holds onto the textures indefinitely -- until the whole backend gets killed. When it's time to make a new allocation, it uses one of the "free" textures - but if and only if the size of the new texture matches the size of the free texture exactly.

This is not a big issue for KnnClassifier because it always allocates tensors of size from a small consistent pool (I think) - for example, 2x48. But apparently universal-sentence-encoder allocates tensors based on a size of the input phrase, so it's different most/every time. This allocates tons of different textures of various sizes and then never releases the memory back to the OS.

As a result, even for a small dev test account I get the following stats once after a brief usage:

_numBytesAllocated: 671494752 // ~670MB
_numBytesFree: 643054816 // ~640MB

So in my case I only need ~30MB of GPU memory, but ~670MB was allocated. See the full snapshot of a TextureManager data here

mazed-dev / truthsayer

make tfjs WebGL backend cache textures less aggressively #833