How to use torchtext for tasks involving image/tabular data like image captioning?

pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch

https://pytorch.org/text

BSD 3-Clause "New" or "Revised" License

3.52k stars 811 forks source link

How to use torchtext for tasks involving image/tabular data like image captioning? #666

Open Hans0124SG opened 4 years ago

Hans0124SG commented 4 years ago

❓ Questions and Help

Description

Hi, thanks for the great library. I am wondering is there a way to use torchtext Dataset for multi-modal data? An example task will be image captioning, where we need to generate some text based on the input image. Or generating text from tabular data, from example table summarization.

jjmachan commented 4 years ago

I had the same question. I'm trying to use torchtext for processing the text In an image captioning task. Any suggestions?

zhangguanheng66 commented 4 years ago

What's your dataset look like? Can you give an example? cc @fmassa

jjmachan commented 4 years ago

I'm trying to use it for the flickr8k dataset. I have an image and 5 sentences that describe it. I want to be able to load an image and 1 sentence vec for training.

fmassa commented 4 years ago

This seems pretty similar to CocoCaption, which is available in torchvision in https://github.com/pytorch/vision/blob/bf843c664b8ba0ff49d2921237500c77d82f2d04/torchvision/datasets/coco.py#L7-L78

You can probably take a similar approach for this dataset

jjmachan commented 4 years ago

Yes it is! Thanks for pointing it out.