Closed liuxing007 closed 10 months ago
(1) Indeed, results can vary based on the versions of numpy, pytorch, and even the GPU in use. For optimal reproduction of our results, it would be best to use an NVIDIA 1080ti, as that's the GPU I utilized during model training.
(2) I think uMLP can be extended to more scenarios, because it is the modified version of MLP in transformer, , you can try its applicability in different tasks.
Great work! (1) I followed your code for training and testing, and the results are shown in the figure. However, the expected effects mentioned in the paper were not achieved. What could be the reason for this? (2) Why can uMLP achieve better results? Is this specific to HPE , or is it applicable more generally?