I noticed that in your paper, you discussed the impact of different mask ratios during the sampling process, but I was curious if you have also experimented with different mask ratios during the training process. Specifically, I was wondering if you have tried varying the mask ratio during training and how that might affect the performance of your model.
Increasing the masking ratio during training can help sample with a larger mask ratio during generation, but increasing the masking ratio too much leads to difficulties training the model
I noticed that in your paper, you discussed the impact of different mask ratios during the sampling process, but I was curious if you have also experimented with different mask ratios during the training process. Specifically, I was wondering if you have tried varying the mask ratio during training and how that might affect the performance of your model.