In sm_simple.py, SM-G-SUM and SM-G-ABS scaling differ by sz^2

Note that torch.autograd.backward() calculates the sum of gradients in all states (at least in 0.4.1 https://pytorch.org/docs/stable/autograd.html?highlight=backward#torch.autograd.backward)

SM-G-SUM feeds backward() outputs of 1 and then uses the returned gradients unaltered (ie their sum across states) SM-G-ABS feeds backward() outputs of 1/sz and then manually calculates the mean of the gradients of the individual states, whereas in SM-G-SUM they were summed inside of backward()

The result is SM-G-SUM using a scale that is sz^2 larger in magnitude than SM-G-ABS. This is difficult to notice when the length of the states is only 2 as in the example, especially so since SM-G-ABS will return a naturally larger scale due to no washout.

Absolutely awesome work on your genetic and evolutionary research! Safe mutations are an incredible milestone in genetic optimization! Now just throw away tensorflow and pytorch and start coding in pure Cuda like you ought to be :)

uber-research / safemutations

In sm_simple.py, SM-G-SUM and SM-G-ABS scaling differ by sz^2 #2