Closed marsggbo closed 2 years ago
Thanks for the great work.
I wonder how you train the sub-supernets before splitting the supernet by grads?
Let's take NASBench201 as an example, say we have a sub-supernet with encodings of
tensor([[1., 0., 1., 1., 0.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 0., 1., 1., 0.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]], device='cuda:0')
Are all operations with the value of 1 involved in the forward and backward processes? Or do you randomly sample only one operation for each edge for each training batch?
Hope your problems have been solved. If you have any further questions, welcome to have a discussion with us.
Thanks for the great work.
I wonder how you train the sub-supernets before splitting the supernet by grads?
Let's take NASBench201 as an example, say we have a sub-supernet with encodings of
Are all operations with the value of 1 involved in the forward and backward processes? Or do you randomly sample only one operation for each edge for each training batch?