Differences in Pretrained and Output Models

utcsilab / score-based-channels

Source code for paper "MIMO Channel Estimation using Score-Based Generative Models", published in IEEE Transactions on Wireless Communications.

Other

63 stars 23 forks source link

Differences in Pretrained and Output Models #5

Closed sheidaRze closed 1 year ago

sheidaRze commented 1 year ago

Hello,

I have a question about the pretrained model provided on this web page, final_model.pt. After running the train_score.py script to obtain the trained model, I noticed some differences between the output model and the one you provided. Specifically, the configuration of the output model I obtained does not include the "sampling" field, which is required for the test_score.py script (e.g., nmse_log: config.sampling.steps_each on line 91). Additionally, there are a few other . For example, the value of config.data.noise_std is 0 in the output model I obtained, whereas it is 0.01 in the pretrained model. I would like to understand the reasons behind these differences.

Thank you.

mariusarvinte commented 1 year ago

Hello and thank you for pointing out these issues. I've committed changes to address your concerns.

On a general note, the parameters that you mentioned are only relevant for the sampling stage, and do not matter for training, hence why they're missing from the model itself. These parameters can and should be only set during the inference stage. Some detailed comments below.

Regarding config.sampling.steps_each this is indeed a very important hyper-parameter. As in our paper, we used config.sampling.steps_each = 3 in all our experiments. I have now added a line of code which sets this explicitly in test_score.py: https://github.com/utcsilab/score-based-channels/blob/199792eb8eeb6ce2eb1e7e6215d1798ec4ea07f7/test_score.py#L55-L56

I can't seem to find other missing or relevant fields in config.sampling except for config.sampling.actual_steps which is set ad-hoc (as it should be) in test_mmse.py.

Regarding config.data.noise_std, this is used to generate noisy received pilots in the dataloader itself. However, MIMO pilots are not used or required at all during training, and during testing we always generate them ad-hoc at the correct SNR, e.g. in test_score.py via: https://github.com/utcsilab/score-based-channels/blob/199792eb8eeb6ce2eb1e7e6215d1798ec4ea07f7/test_score.py#L119-L124

The reason why that field is there, is for methods like L-DAMP, which do require noisy pilots during training.

Let me know if there are other concerns or missing fields.

sheidaRze commented 1 year ago

Hello,

Thanks for your reply.

I have two other questions and your response is appreciated.

1- In test_score.py, for the inference phase, the variance of the train samples are used to normalize the channels:

https://github.com/utcsilab/score-based-channels/blob/199792eb8eeb6ce2eb1e7e6215d1798ec4ea07f7/loaders.py#L68-L69

which is then passed to test_score.py to generate the observation samples y

https://github.com/utcsilab/score-based-channels/blob/199792eb8eeb6ce2eb1e7e6215d1798ec4ea07f7/test_score.py#L120

However, such process is not feasible as we only have access to the observation data not the channels. So why is that?

2- The equation that is used to update the channel in the code seems different from what given in the paper:

Apply update:

https://github.com/utcsilab/score-based-channels/blob/199792eb8eeb6ce2eb1e7e6215d1798ec4ea07f7/test_score.py#L139-L141

https://github.com/utcsilab/score-based-channels/blob/199792eb8eeb6ce2eb1e7e6215d1798ec4ea07f7/test_score.py#L153-L162

However, based on the given formula in the paper (Algorithm 1 and eqn.(17) ),

current = current + alpha * (score +meas_grad /(local_noise^2 + current_sigma ^2)) + grad_noise.

Besides, "current_sigma" is missed when calculating the grad_noise.

Thanks.

mariusarvinte commented 1 year ago

Regarding the first question, that normalization is not specific to the test stage, but rather is a way of normalizing channels both during training and testing. Note that self.std is a scalar value, shared across all channels involved in all stages (something around 0.3 for CDL-C, if I remember correctly).

That value is not explicitly used anywhere during testing, nor is it required. It is done for convenience in SNR calculation. The channel will be estimated and compared to ground-truth after normalization.

Regarding the update step, both Algorithm 1 and Eqn. (17) are correct and consistent with each other, as well as the implementation. The reason for the additional term in the denominator of the measurement grad is explained in the paper right after Eqn. (17):

In practice, we also include an annealing term in the denominator of the above, as shown in Algorithm 1.

This originates from [1] as a heuristic used to rely less on the measurement gradient in early stages and does not affect the noise generated during inference.

[1] Jalal, A., Arvinte, M., Daras, G., Price, E., Dimakis, A.G. and Tamir, J., 2021. Robust compressed sensing mri with deep generative priors. Advances in Neural Information Processing Systems, 34.

sheidaRze commented 1 year ago

Thank you for your response.

1- I understand that the channels are normalized for easy signal-to-noise ratio (SNR) calculation. However, I have a question regarding why the observation data y is generated using the normalized channel matrix. This is because at the receiver, we only receive Y = HP + N, where H is not available nor normalized.

Regarding the second part, let me clarify my question further. https://github.com/utcsilab/score-based-channels/blob/199792eb8eeb6ce2eb1e7e6215d1798ec4ea07f7/test_score.py#L153-L162

2.1- In the update function described in the paper, we have "score + meas_grad / (local_noise + current_sigma 2)." However, in the code, it is written as "score - meas_grad / (local_noise/2 + current_sigma 2)." Here, the local noise is divided by 2, and the "+" operation has been changed to "-".

2.2- the paper includes "current sigma" in the formulation when calculating the grad_noise, but in the code, "current sigma" is removed.

Thanks.

mariusarvinte commented 1 year ago

1 - This is an assumption we make, that channels always come already-normalized. If this is not convenient, one can also view the model we used as Y = 1/c * H * P + N, where c is the scalar normalization constant. It's highly likely that re-training and re-testing without normalizing channels will give the same results. If the constant were to be omitted from the forward model, it should only be a shift of the SNR curve. In that case, the pilots could be divided by c and the same model could be re-used out of the box.

2.1 - The sign change does seem like a typo in the paper (stemming from Eq. 17) after closer inspection. The code should be correct. From the deep neural network's perspective $\sigma$ is the standard deviation of noise in both its real and imaginary parts during reverse diffusion because it works with real/imag parts as separate signal channels and uses $\sigma$ internally. Hence, we used $\sigma_\mathrm{pilot} / \sqrt{2}$ as the noise standard deviation added to the real/imag parts of the received pilots separately to keep their interpretations consistent.

2.2 - Indeed, the additional $\sigma$ in front of the added noise is likely a typo in the paper after closer inspection. There is a squared $\sigma$ implicitly in $r^i$ already. The code should be correct.

sheidaRze commented 1 year ago

Got it. Thanks for all the clarifications.