paperswithlove / papers-we-read

3 stars 0 forks source link

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs #21

Open runhani opened 2 months ago

runhani commented 2 months ago

Ferret-UI

https://arxiv.org/pdf/2404.05719.pdf

이전 논문 및 코드

https://arxiv.org/pdf/2310.07704.pdf https://github.com/apple/ml-ferret

Overview

What is it?

image

image

Architecture

image

Data Generation

image image