How do you train the sub-supernets before splitting the supernet by grads?

Thanks for the great work.

I wonder how you train the sub-supernets before splitting the supernet by grads?

Let's take NASBench201 as an example, say we have a sub-supernet with encodings of

 tensor([[1., 0., 1., 1., 0.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 0., 1., 1., 0.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], device='cuda:0')

Are all operations with the value of 1 involved in the forward and backward processes? Or do you randomly sample only one operation for each edge for each training batch?

skhu101 / GM-NAS

How do you train the sub-supernets before splitting the supernet by grads? #1