haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.28k stars 2.12k forks source link

[Feature request] Add better support for Brazilian Portuguese #1669

Open insinfo opened 3 weeks ago

insinfo commented 3 weeks ago

feature

I did a test to OCR scanned documents in Brazilian Portuguese, and I saw that LLaVA makes a lot of mistakes on scanned documents in Portuguese

result from https://huggingface.co/spaces/badayvedat/LLaVA

Prompt: transcribe image

1-1

The image features a white piece of paper with a stamp on it. The stamp is placed in the upper left corner of the paper, and it appears to be a government document. The paper is also marked with a signature, which is located in the middle of the document. The overall appearance of the paper suggests that it is an official document, possibly related to a government agency or a legal matter.

the correct thing would be

Here is the transcription of the text from the image:

ESTADO DO RIO DE JANEIRO
Prefeitura Municipal de Rio das Ostras
PROTOCOLO GERAL

Processo: 18457 / 2003
Data: 03/09/2003
Hora: 10:53:56
Requerente: COSCARELLI E CIA LTDA ME
Sec. Destino: Secretaria Municipal de Fazenda
Dept. Destino: Depto. de Tributos e Fiscalização
Assunto: ALVARÁ

result

110-1

The image features a white piece of paper with a sticker on it. The sticker has a barcode and a label, which reads "Estado do Rio de Janeiro." The paper appears to be a certificate or a document related to the city of Rio de Janeiro. The paper is placed on a table, and the sticker is prominently displayed on the surface.

the correct thing would be

Here is the transcription of the text from the image:

ESTADO DO RIO DE JANEIRO
Prefeitura Municipal de Rio das Ostras
PROTOCOLO GERAL

Processo: 15314 / 2003
Data: 25/07/2003
Hora: 16:18:28

Requerente: COLÔNIA DE PESCADORES Z-22
Sec. Destino: Sec. Mun. Urbanismo Obras e S. Pub.
Dept. Destino: 0
Assunto: AGRADECIMENTO / FAZ

image

insinfo commented 3 weeks ago

perhaps training on this dataset could help improve accuracy https://zenodo.org/records/7872951