Open Hans0124SG opened 4 years ago
I had the same question. I'm trying to use torchtext for processing the text In an image captioning task. Any suggestions?
What's your dataset look like? Can you give an example? cc @fmassa
I'm trying to use it for the flickr8k dataset. I have an image and 5 sentences that describe it. I want to be able to load an image and 1 sentence vec for training.
This seems pretty similar to CocoCaption, which is available in torchvision in https://github.com/pytorch/vision/blob/bf843c664b8ba0ff49d2921237500c77d82f2d04/torchvision/datasets/coco.py#L7-L78
You can probably take a similar approach for this dataset
Yes it is! Thanks for pointing it out.
❓ Questions and Help
Description
Hi, thanks for the great library. I am wondering is there a way to use torchtext Dataset for multi-modal data? An example task will be image captioning, where we need to generate some text based on the input image. Or generating text from tabular data, from example table summarization.