haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.14k stars 2.1k forks source link

Dataset from OpenAI GPT-4 Multimodal? #397

Open chigkim opened 1 year ago

chigkim commented 1 year ago

Question

It says "Dataset date: LLaVA Visual Instruct 150K was collected in April 2023, by prompting GPT-4-0314 API." https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K Did you guys have a special access to gpt-4 multimodal back then? I don't think gpt-4 multimodal is out yet.

haotian-liu commented 1 year ago

We use text-only GPT-4 and we do not have access to the multimodal one. We provide the image context via caption/bounding box. You may refer to Sec. 3 and Table 1 in our paper for a better understanding.

chigkim commented 1 year ago

Thanks so much for the info! OpenAI started opening beta access to BeMyEyes users as BeMyAi feature, and the out put is fantastic with lots of detail!

Image

The picture shows the interior of a modern building, possibly a shopping center or a corporate building. The space is open and airy with high ceilings. The ceiling is made of wooden panels with embedded lights. There is a large staircase in the foreground with wooden steps and black railings. The staircase leads to an upper level which has a balcony overlooking the ground floor.

On the ground floor, there is a small information desk with a person in a blue uniform sitting behind it. There are also a couple of high-end shops visible, one of them is "Watches of Switzerland" and the other is "Cartier". The shops have glass walls, allowing a clear view of the displays inside.

The walls of the building are a mix of glass and brick, and there are tall windows that allow natural light to flood in. There are a few people scattered around, some are walking and others are standing on the balcony of the upper level. The floor is made of polished concrete, giving it a sleek and modern look.

chigkim commented 11 months ago

@haotian-liu, OpenAI is rolling out GPT4-V to ChatGPT Plus users, and it's amazing. Do you guys have access to it yet? Probably worthwhile to look into building synthetic dataset from it!