Create extra samples with surplus images

mlfoundations / open_flamingo

An open-source framework for training large multimodal models.

MIT License

3.68k stars 277 forks source link

Create extra samples with surplus images #272

Open isaac-chung opened 12 months ago

isaac-chung commented 12 months ago

Addresses issue https://github.com/mlfoundations/open_flamingo/issues/231

chunk and yield a sample every max_num_images valid images
refactor preprocess_gpt_interleaved and preprocess_interleaved

isaac-chung commented 12 months ago

Few questions before I make more changes:

What is a good way to test this?
I've noticed some subtle differences between the image/text processing steps between preprocess_gpt_interleaved and preprocess_interleaved, just wondering if these could be consolidated. e.g.
- the gpt one pads with shapes (3, 224, 224) while mmc4 depends on the image size.
- mmc4 has 50% chance of keeping single image samples while gpt does not.
- mmc4 avoid the situation where there's one token and it's at the end while gpt does not.

anas-awadalla commented 12 months ago

Yeah I think we can default to the mmc4 code for all of these.

For the first point this should be based on the image size. I think because most vision encoders have size 224x224 we just defaulted to that but that isn't the right way to do it.