Hyper parameter selection for practical applications

I noticed that the method in the paper relies heavily on hyperparameter tuning. However, since the target domain lacks labels, tuning ultimately relies on validation set performance for optimal results, which is then reported. Does this approach ensure a fair comparison, especially given the variability that some methods show over epochs?

Additionally, a recent paper [1] raised a similar concern. It evaluates by plotting performance curves and calculating stability metrics, which might be relevant. Looking forward to your response!

[1] Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, and Dongmei Zhang. 2024. Source Free Graph Unsupervised Domain Adaptation. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM ’24). Association for Computing Machinery, New York, NY, USA, 520–528. https://doi.org/10.1145/3616855.3635802

pygda-team / pygda

Hyper parameter selection for practical applications #4