yolky / RFAD

Code for the paper "Efficient Dataset Distillation using Random Feature Approximation"
36 stars 4 forks source link

Release date and comparison question #1

Open RK-TUI opened 2 years ago

RK-TUI commented 2 years ago

Hello, I currently plan to write my Master's thesis about Data Distillation and I am very interested in your work. Is there already a date for the publication of your code, or any other way to get access to it?

Also, I would like to ask how you evaluate the computational time and memory requirements of RFAD compared to the FRePo method of the paper "Dataset Distillation using Neural Feature Regression" ?

Thank you very much in advance!

yolky commented 2 years ago

Hi Robert,

We will be releasing the official code before the NeurIPS conference. In the meantime you could find code from the supplementary material on openreview (https://openreview.net/forum?id=h8Bd7Gm3muB) - Note that it is still a bit rough around the edges. In terms of comparison to FRePo:

  1. Our method is primarily designed for kernel ridge regression with infinite width NNGP/NTK kernels, whereas theirs is mainly concerned with training networks with GD. You'll notice that the finite network performance in our paper is a bit less emphasized due to this. Also note that we use a very wide network when we are doing finite network SGD, so directly comparing the performance of the two is a bit difficult.

  2. You could actually consider FRePo and RFAD to be the same algorithm if we set certain hyperparameters to be the same, namely:

  3. Set the FRePo max-online-steps to be equal to 1 (so that we sample a new random model every time we use it)

  4. Set the RFAD number of models parameters (M) to be 1 (so that rather than using multiple networks at each iteration we use a single one). Note that in our paper we use M = 8 as the default

  5. Use the MSE loss instead of the platt-loss in RFAD

  6. Also RFAD uses a slighly different architecture than us (they use a different # of conv channels for each layer, whereas we keep it the same)

Overall it's pretty interesting that these two papers came out at the same time and both use a very similar idea of using the conjugate/NNGP kernel

  1. Because our algorithm's runtime is proportional to M, and we use M=8, we would expect FRePo to run around 8x faster than the default settings for RFAD, but if you use M=1 they should have the same memory/time complexity in theory
  2. In practice, our code RFAD isn't very well optimized, so you could probably shave a good bit of time by moving it to a faster library like JAX, where everything can be jit-compiled. Note that FRePo using JAX so it gets a bit of a speed boost just from doing that.

Thanks for showing interest in our paper. If you have any more questions I'd be happy to answer.

Noel

On Mon, Nov 21, 2022 at 11:22 AM Robert Krug @.***> wrote:

Hello, I currently plan to write my Master's thesis about Data Distillation and I am very interested in your work. Is there already a date for the publication of your code, or any other way to get access to it?

Also, I would like to ask how you evaluate the computational time and memory requirements of RFAD compared to the FRePo method of the paper "Dataset Distillation using Neural Feature Regression" https://arxiv.org/pdf/2206.00719.pdf?

Thank you very much in advance!

— Reply to this email directly, view it on GitHub https://github.com/yolky/RFAD/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPJPMZ452QYTPYWO7EIWJLWJOOUNANCNFSM6AAAAAASG2JEOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>