Open landian60 opened 1 year ago
Hi @landian60 , could you please provide additional information (e.g., the sizes of the checkpoints). The batch size could indeed influence the checkpoint size since we cache the fourier features which likely leak into the model's checkpoint due to the persistence_class
decorator.
Thanks for your kind reply even the work is almost 2 yrs ago. Also, if I train with a batch size of 16 per GPU, the checkpoint size is 4.86GB. If I train with a batch size of 24 per GPU, the checkpoint size is 5.91GB. I changed the batch size to fully utilize the V100. So, it turns out that Fourier features occupy a large space and they wouldn’t affect the test process? And could the checkpoint space be saved by caching just one group of Fourier features and repeating batch size numbers on a new dimension? tks!
And I have another question about extrapolating outside of image boundaries. If I want to change the positional encoding coordinates from [0,1] to [-0.3,1.3], should I change the resolution of the logarithmic_basis? But if I do that, the size would not match with the const_embs.
Hi @landian60, you are correct about "could the checkpoint space be saved by caching just one group of Fourier features and repeating batch size numbers on a new dimension". I guess, my reasoning back then was to cache the Fourier features for the whole batch to avoid additional memory allocation (which I thought could be expensive). To be honest, I do not remember benchmarking this (I only remember benchmarking "caching" vs "no caching") — so you might try it. Also, back then, I was not aware of torch.expand
function (which does not allocate new memory) — it should be cheaper to use than torch.repeat
in this scenario: I suspect that since we do concatenation afterward, there should anyway be new memory allocations/deallocations and then it might not matter much whether you used torch.expand
or torch.repeat
.
For extrapolation, you shouldn't change the basis. We didn't use const embeddings to train the generator on bedrooms to perform extrapolation afterwords.
Hello, thanks for your great job. I had tried the experiment, and found that different batch size will change the sizes of checkpoint. Does the _fourier_embs_cache item affect the snapshot size? And if so, should train and test on the same snapshot have the same batch size? tks.