Two questions about code implementation

Hello, this is a very good project. I am a beginner of CS, and I am very interested in this project, I read all the code. However, I have two questions about code.

Question 1: The question about the formula for the second-order approximation in the paper and code.

In the paper, w+ =w + ε▽w'L_val(w',α) and w- =w - ε▽w'L_val(w',α)，how is ▽w'L_val(w',α) expressed in the code? I mean how to calculate ▽w'L_val(w',α) in the code? I don't think this value is calculated in the code?

I currently have 2 kinds of understanding:

In the code, /cnn/architect.py I guess it should be vector = [v.grad.data for v in unrolled_model.parameters()] [Line 49] for this formula ▽w'L_val(w',α). However, I think v.grad.data is just a parameter that is manually updated in function _compute_unrolled_model [Line 20] (becoming virtual gradient step in the paper) , I have not found a code implementation that uses the validation set to derive w'.
In the code, /cnn/architect.py Another guess is that ▽w'L_val(w',α) is calculated in unrolled_loss.backward() [Line 47], but the optimization parameter of the optimizer in unrolled_model is model.arch_parameters() [Line 17]. I don't think this step will calculate the derivative of L_val(w',α) to w'. In addition, the paper also shows that optimizing L_val(w',α) will fixed w, so how to calculate ▽w'L_val(w',α) in the code? Is it wrong with my understanding of Pytorch?

Question 2: In /cnn/architect.py [Line 76] Why not use ε=r=1e-2 directly and use ε=r/_concat(vector).norm(), it is a bit redundant to divide by _concat(vector).norm()?

I look forward to your reply, thank you very much.

quark0 / darts

Two questions about code implementation #121