Closed Devil-Ideal closed 3 weeks ago
Hi! Thanks for your interest in our work and for your thoughtful question.
To provide a bit of context, the Kullback-Leibler (KL) divergence measures how one probability distribution diverges from a second reference distribution. In our work, KL divergence serves as a tool to align the feature distributions produced by the student model with those of the teacher model. By minimizing KL divergence, we aim to ensure that the student's feature representations are as close as possible to the teacher's, helping to transfer the learned feature consistency from teacher to student.
Although the data is unpaired, patients are partially (<50%) overlapping between the two sets, and the two modalities sets differ (microCTs for the student and histopathologies for the teacher), we have observed a natural alignment between both domains. Specifically, in some of our experiments (not reported in the paper due to the number of pages limitation), we've extracted density-based features (i.e., number of osteocytes/patient and lacunae/patient) from histopathologies and microCTs, revealing correlated distributions. These correlations arise because, despite the fact those values could change across conditions like healthy bone, osteoporosis, and COVID-19, osteocytes are consistently located within lacunae in our bones, which brings the density value averaged between the 3 conditions for osteocytes to be close to the lacunae one.
This underlying structural congruency allows us to leverage KL divergence effectively, as the consistency in the averaged features—such as lacunae and osteocytes count—across domains are shared over the 2 image modalities. By minimizing KL divergence on these correlated feature distributions (which representation is more informative for the teacher since trained on histopathology images, which are higher in number and show osteocytes more clearly than what microCTs are doing for lacunae), we enable the student to enhance the feature distribution prediction thanks to the information distilled from the teacher.
The same rationale applies to the DeepLIIF dataset, with a shift in focus within each modality concerning structures segmented (cells) and conditions (positive and negative cells in bladder carcinoma and non-small-cell lung carcinoma).
I hope this clarifies how the congruency constraint works in our framework and why KL loss is an effective choice despite the unpaired nature of the data. Let me know if you have further questions!
Hi! Thanks for your interest in our work and for your thoughtful question.
To provide a bit of context, the Kullback-Leibler (KL) divergence measures how one probability distribution diverges from a second reference distribution. In our work, KL divergence serves as a tool to align the feature distributions produced by the student model with those of the teacher model. By minimizing KL divergence, we aim to ensure that the student's feature representations are as close as possible to the teacher's, helping to transfer the learned feature consistency from teacher to student.
Although the data is unpaired, patients are partially (<50%) overlapping between the two sets, and the two modalities sets differ (microCTs for the student and histopathologies for the teacher), we have observed a natural alignment between both domains. Specifically, in some of our experiments (not reported in the paper due to the number of pages limitation), we've extracted density-based features (i.e., number of osteocytes/patient and lacunae/patient) from histopathologies and microCTs, revealing correlated distributions. These correlations arise because, despite the fact those values could change across conditions like healthy bone, osteoporosis, and COVID-19, osteocytes are consistently located within lacunae in our bones, which brings the density value averaged between the 3 conditions for osteocytes to be close to the lacunae one.
This underlying structural congruency allows us to leverage KL divergence effectively, as the consistency in the averaged features—such as lacunae and osteocytes count—across domains are shared over the 2 image modalities. By minimizing KL divergence on these correlated feature distributions (which representation is more informative for the teacher since trained on histopathology images, which are higher in number and show osteocytes more clearly than what microCTs are doing for lacunae), we enable the student to enhance the feature distribution prediction thanks to the information distilled from the teacher.
The same rationale applies to the DeepLIIF dataset, with a shift in focus within each modality concerning structures segmented (cells) and conditions (positive and negative cells in bladder carcinoma and non-small-cell lung carcinoma).
I hope this clarifies how the congruency constraint works in our framework and why KL loss is an effective choice despite the unpaired nature of the data. Let me know if you have further questions!
Thank you so much for explaining in such detail. It all makes sense to me now.
Hi ! Thank you for you awesome work. There is one detail I don't quite understand. Since the data is unpaired, the two labels are different, why can KL loss be used to constraint the student model?