Closed Alsac closed 3 months ago
Thank you for your feedback! We have tested various architectures for SAMformer, and we found that a shallow model tends to perform better, particularly because transformers have a tendency to overfit quickly on these types of data. Thus, we recommend keeping the number of layers low for optimal performance. If you really want to use multiple layers, I would suggest increasing the strength of the regularization by increasing the value of rho when using SAM, to help mitigate the risk of overfitting.
Thank you for your job! I tested this code, I find the performance of smaformer is great when there is one layer of scaled_dot_product_attention, other the mse will decreased. can you have any methods to deepen it? thank you!