[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion model with additional semantic prior.
Hello, I would like to ask the cross-modal model to retrieve semantically related sentences from the training sentence pool. How is the training sentence pool obtained? Thank you very much!
Hello, I would like to ask the cross-modal model to retrieve semantically related sentences from the training sentence pool. How is the training sentence pool obtained? Thank you very much!