dMMSE baseline streaming

matomatical commented 1 year ago

In some of our experiments, M (num_tasks) for the discrete distribution is large, like 2^20 (~1 million) tasks.

We originally thought such a task matrix was too large for (V)RAM but while task dimension D=8, it's only 4 bytes 8 float32s 2^20 tasks = 32 MiB. We explored randomly generating tasks with 2^20 fixed seeds but then rolled back this change.

The real memory issues were coming from the dMMSE baseline computation, which was a batched computation of a softmax distribution over all M=2^20 tasks in parallel across K=16 context prefixes and an evaluation batch size of B=2048. Part of this computation was a BxKxM matrix of float32's i.e. 4 bytes (2^20 tasks 2^4 context 2^11 batch) = 2^37 bytes = 128 GiB. That* was too big for our (V)RAM.

So the net effect of this PR is to further batch the computation of the dMMSE baseline on a single batch. A maximum batch size is computed based on a configurable bytes limit for the BxKxM matrix (default: 2^28 = 256 MiB which works on my GPU), and the computation for the batch is broken down into enough sub-batches so that each sub-batch fits into this limit.

matomatical commented 1 year ago

Oh yes, also wanted to say, these three commits already appeared in the add/rlct-estimation branch, I picked them over into this new branch and cleaned the diffs for posterity. This was my first such git operation and I hope I didn't cause any future problems, but apologies in advance.

matomatical commented 1 year ago

@jqhoogland I don't think this PR really needs your review because we already discussed the changes and they are already in the add/rlct-estimation branch. But you might like to take a look, and the changes are collected together neatly in this PR. I will merge now that I can run large-M experiments from main until rlct-estimation is finished. Hope that's OK.

timaeus-research / icl

dMMSE baseline streaming #5