Eventually, we want things to work with VQ-VAE style architectures, where submodules output integer tensors. For my test NonDiffSubModule , if I return a long tensor and don't have a skip connection the graph just breaks when it reaches the NonDiffSubModule, which makes sense because the gradfn's dissapear and it can't "jump" over the nondifferentiable module. I should test that things work right in a VQ-VAE architecture when the gradients are "copied" backwards over the nondifferentiability.
Eventually, we want things to work with VQ-VAE style architectures, where submodules output integer tensors. For my test
NonDiffSubModule
, if I return along
tensor and don't have a skip connection the graph just breaks when it reaches theNonDiffSubModule
, which makes sense because thegradfn
's dissapear and it can't "jump" over the nondifferentiable module. I should test that things work right in a VQ-VAE architecture when the gradients are "copied" backwards over the nondifferentiability.