[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion model with additional semantic prior.
Not really an issue, but I have a question about using custom datasets to train this model for a medical image captioning task, specifically for radiograph images. From what I understand, I need to extract image features from my dataset using the bottom-up-attention model. Can you please confirm if this is correct?
Additionally, is there an easier way to train the SCD-Net model on a custom dataset? Any assistance you can provide would be greatly appreciated.
Hi!
Not really an issue, but I have a question about using custom datasets to train this model for a medical image captioning task, specifically for radiograph images. From what I understand, I need to extract image features from my dataset using the bottom-up-attention model. Can you please confirm if this is correct?
Additionally, is there an easier way to train the SCD-Net model on a custom dataset? Any assistance you can provide would be greatly appreciated.
Best regards, A desperate student