Open a897456 opened 10 months ago
@a897456 I think the parameter counts given in the paper are only for the generator, since this is what matters for inference. While the total number of parameters (and hence the model checkpoint sizes) are relatively big, the training memory footprint is primarily due to the generator.
What's going on here? How can I reduce the memory?