shamblett / alpaca

An alpaca implementation in Dart
MIT License
5 stars 0 forks source link

Fix massive memory usage. #2

Closed shamblett closed 1 year ago

shamblett commented 1 year ago

After asking one question the memory usage of the chat application balloons to 47GB on my system(slightly less if compiled), the Alpaca chat application memory increases by approx 5GB after one question and stays there on subsequent questions.

The memory usage of the Dart application needs to be analysed to see what is going on here.

shamblett commented 1 year ago

There are 2 parts to this, model loading and eval processing.

Model loading was leaving large unclearable(fixed length) Uint8Lists in memory after the model was loaded, this has now been re written to not use dart:io but low level libstdc functions using mmap/munmap so we can clear any file bytes when we have finished. This is now as efficient as I can make it.

Eval is loading memory up on every iteration, probably because we are not freeing the tensors after we do a compute to get the logits. This needs looking at next.

shamblett commented 1 year ago

Freeing the gf and the embd tensor has reduced memory usage down to an acceptable level for now although more work is needed here, this will now be picked up on #3

We now stabilise at approx 14GB and creep forward about 700M per question, not great but considerably better than it was, it is now use able as a first release. Note the alpaca chat stabilises at 12GB and sits there.

Release update to 1.1.0