Open pILLOW-1 opened 4 weeks ago
Thank you for your interest. Regarding whether DatasetDM can be applied to x-ray datasets, I suggest first using generative models like Stable Diffusion or SDXL to generate x-ray data. If these models can successfully generate the necessary x-ray images, then DatasetDM should be compatible for use in your case.
Whether synthetic data can serve as effective training data to enhance model performance largely depends on two factors:
Image Quality: The quality of the synthetic images is crucial. This includes whether the synthetic data can accurately represent the target objects and how much visual discrepancy there is between synthetic and real images.
Annotation Precision: For segmentation tasks, the accuracy of masks is essential. If it’s for another type of task, then high-quality annotations relevant to that task are key.
OK. Thanks for your advice!
Hi, @weijiawu
It is really a great work to leverage the generative model to construct synthetic data for downstream tasks. At the same time, my task is to construct synthetic data in x-ray security image domain that is pseudo-colored and different from natural image domain, and I have some questions to ask for your help.
My first question is that is it feasible to fine-tune DatasetDM on x-ray dataset? Since x-ray images belong to a diiferent domain, I cannot directly apply DatasetDM. Besides, I have also made some experiments like training conditional diffusion models such as GLIGEN to generate x-ray images, but the generated results are not so good. Problems like object missing, repeating and occlusion issues have emerged, and I am not sure whether it has to do with my dataset size( training set consists of 27,708 images).
My second question is that when we say 'Synthetic Data for Perception Tasks', what is the role of synthetic data? To put it in another way, how can we judge the quality of synthetic data is perfectly suitable for our downstream tasks? For example, my downstream task is to detect all prohibited items in an x-ray security image. Sometimes, the generated images may be visually acceptable for humans but not for models, and different visual cues may be of different importance to different perception models. Is there a way for us to generate the most important synthetic data for downstream models to maximally learn from it and perform well on test sets?