huawei-noah / noah-research

Noah Research
855 stars 159 forks source link

the parameter K in paper, but in the code of multiuat doesn't work #98

Open zhl5842 opened 2 years ago

zhl5842 commented 2 years ago

image

K=10 and each e is the same。。。

zhl5842 commented 2 years ago

because each time the sample is the same.

minghao-wu commented 2 years ago

Hi @zhl5842 ,

Thank you for your question.

In this work, we leverage Monte Carlo Dropout to obtain samples of sentence-level translation probability. You can find more details in the link.

Ignoring all those math details, we run K forward passes for the same input with the dropout activated when using Monte Carlo Dropout. That is, for each forward pass, the model makes predictions with a random subset of parameters. Hence, the model prediction es are different across K inferences, even though the input is the same, and we take the expectation over multiple inferences to obtain predictive posterior.

zhl5842 commented 2 years ago

Hi @zhl5842 ,

Thank you for your question.

In this work, we leverage Monte Carlo Dropout to obtain samples of sentence-level translation probability. You can find more details in the link.

Ignoring all those math details, we run K forward passes for the same input with the dropout activated when using Monte Carlo Dropout. That is, for each forward pass, the model makes predictions with a random subset of parameters. Hence, the model prediction es are different across K inferences, even though the input is the same, and we take the expectation over multiple inferences to obtain predictive posterior.

hi, but you have used model.eval() in the code , and dropout would don't work ? https://github.com/huawei-noah/noah-research/blob/master/noahnmt/multiuat/fairseq/fairseq/tasks/multiuat-multilingual-translation.py#L381

minghao-wu commented 2 years ago

Hi @zhl5842 ,

Thank you for pointing out this problem.

You're right, and I believe this is a mistake in releasing the code. Perhaps I handed a wrong version of the code over to my colleague when releasing the code, because I checked my own private code and didn't find this line. This line of code can be safely removed.

I will re-run all the related experiments to make sure everything is correct. I think this is actually an interesting analysis to measure the effect of Monte Carlo Dropout.

I will keep you updated as soon as I get the results (within 24 hours).

I am no longer working at HUAWEI and have contacted my colleague to fix this problem.

Thank you again.

zhl5842 commented 2 years ago

OK @minghao-wu , and I had run the result of K =1 and find both results(K=1 and K=10) are similar. looking forwards to your results!

In addition, I find, comparing to multidds , multiuat has the init --lr =5e-04, --weight-decay 0.0001, but multidds has the init --lr =2e-04, --attention-dropout 0.3 --relu-dropout 0.3 --weight-decay 0.0, I had reproduce the results, which are similar to the paper released, But when I keep these parameters same(--lr =5e-04, --weight-decay 0.0001), the results of multidds and multiuat are also similar, so the Uncertainty-Aware method has nothing advantage ?

minghao-wu commented 2 years ago

Hi @zhl5842 ,

With the limited computational resources and the overly large hyperparameter search space, I didn't do the hyperparameter searching for multidds-s and directly used their recommended hyperparameters, assuming their recommended hyperparameters are optimal for their own approach. In fact, I didn't successfully reproduce their results with their recommended hyperparameters and my own implementation, so that I use their reported results in the main content of my paper to have a fair comparison, with the same assumption. You know, it's very hard to have a 100% perfect re-implementation. Their released code has compatibility issues on our computational hardwares. To be honest, I am also surprised that my hyperparameters work well for their approach.

As I mentioned in my paper, most of our observations are consistent with Wang et al., 2020 and the main focus of my work is on the multi-domain NMT. We find out that multidds-s is vulnerable in multi-domain NMT and our approach multiuat works reasonably well for both multilingual and multi-domain NMT. Given the fact that the text corpora may come from heterogeneous sources, our approach is a safer and better choice. That is, the strength of multiuat is mainly demonstrated on the multi-domain NMT.

No advantage? Yes and No, multidds-s and multiuat may have similar performance in multilingual NMT (as shown in Figure 1 and your own results), but multiuat is definitely a better choice when you do not have sufficient understanding about your datasets.

zhl5842 commented 2 years ago

Hi @minghao-wu First I don't know why you change the hyperparameter LR, I suppose if we use the multidds hyperparameter, and the results will be also the same, so it can't prove multiuat is effective

Second, for multi-domain NMT, I think your method maybe effective , but I think you should keep the same hyperparameter and see the difference of the results , otherwise you can't convince me, like the parameter K.

thanks , looking forwards to your new results!

minghao-wu commented 2 years ago

Hi @zhl5842 ,

Firstly, as I mentioned before, I assumed their recommended hyperparameters are optimal for their approach, so I directly followed their recommendations. I didn't apply my hyperparameters to their approach. When the resources are limited, carefully tuning others' work is not my first priority.

Secondly, in multi-domain NMT, I use the same hyperparameters for both multidds and multiuat, which are tuned by myself, because Wang et al. (2020) didn't apply multidds to multi-domain NMT. With the identical setup in multi-domain NMT experiments, there is a significant difference between these two approaches.

minghao-wu commented 2 years ago

I re-ran the multiuat with and without Monte Carlo Dropout on the multilingual M2O diverse setup, and find out that the Monte Carlo Dropout only has a marginal effect to the final performance. The choice of K doesn't make a big difference either. The improvement mainly comes from the algorithm itself. A smart choice of reward function can make the training more robust.

zhl5842 commented 2 years ago

Hi @minghao-wu Firstly, the parameter K is useless, It is verified secondly, I think, for multilingual NMT, the improved result is only from increasing the LR(2e-04 to 5e-04), the main reason is that the multidds is not well trained. Actually, multiuat (LR=5e-04 and Uncertainty ) comparing to multidds(2e-04 and grad cross sim) gets the improve , and your paper says this mainly from Uncertainty, I don't think so, and the results are unbelievable. I am sorry for it, but it is actual experimental results.

minghao-wu commented 2 years ago

Hi @zhl5842 ,

I don't think we have a huge disagreement on the multilingual NMT results. It's indeed a great finding that mulitdds can be improved with my hyperparameters and I do encourge you to keep working on it. As I mentioned over and over again, I assumed the recommended hyperparameters from Wang et al (2020) and didn't tune hyperparameters for multidds. Tuning hyperparameters for others is not my top priority. I compared my results with their reported results. If their reported results is under-performed, I'm not the one to be blamed.

Again, I have been saying this for many times and don't mind emphasize it again. The main focus of our work is that multiuat is robust against the change of datasets and multidds is not. All of our experiments are designed for this argument. The core value of our work is about our findings on the multi-domain NMT. The updated multilingual NMT results of multidds do not change its vulnerability on multi-domain NMT.

Don't be sorry. You just don't care about the valuable part of my work. It's your loss, not mine.

I won't continue this inconstructive conversation.