yzhHoward / SMART

Official implementation of SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status Prediction
4 stars 2 forks source link

Questions Regarding Pretraining and Fine-tuning Strategies #2

Open gilyoungCoder opened 1 month ago

gilyoungCoder commented 1 month ago

I hope this message finds you well. I have been exploring your implementation using the PhysioNet Challenge 2012 dataset, and I have a couple of questions regarding the pretraining and fine-tuning strategies utilized in your project.

  1. Data Splitting for Pretraining and Fine-tuning: In your implementation, did you split the dataset internally for pretraining and fine-tuning phases? For instance, when working with the Challenge 2012 dataset, did you use a subset of the dataset (e.g., patients 1–1000) for pretraining and another subset (e.g., patients 1001–2000) for fine-tuning? Or did you utilize the same dataset for both phases without further splitting?

  2. Handling Reconstruction Loss and Parameter Freezing: During the fine-tuning phase, did you freeze the parameters that were optimized based on the reconstruction loss calculated during pretraining? Alternatively, did you continue to include the reconstruction loss during the fine-tuning phase as well? If you tried both approaches, which method yielded better performance in your experiments?

I would greatly appreciate any insights or experiences you could share regarding these strategies. Thank you for your time and for providing such a valuable resource for the community.

yzhHoward commented 1 month ago

Thanks for your attention.

In both pre-training and fine-tuning stages, we use the same seed, which ensures the same splits for training, validation, and prediction sets. You can check run.sh to confirm this.

In fine-tuning, we freeze the pre-trained parameters for the first 5 epochs and only update the classifier. There is no pre-trained reconstruction loss in the fine-tuning phase, which you can confirm in the code. We conducted a simple experiment and freezing the first 5 epochs achieves better results than not freezing. As for adding reconstruction loss to fine-tuning, I think it may not bring better results, because reconstruction and classification are different tasks, and optimizing them at the same time may have trade-offs.