huggingface / huggingface.js

Utilities to use the Hugging Face Hub API
https://hf.co/docs/huggingface.js
MIT License
1.3k stars 169 forks source link

Add a task for automatic text recognition #455

Open PonteIneptique opened 5 months ago

PonteIneptique commented 5 months ago

Hi :) We are in the process of working a pipeline to help people publish their data to huggingface in the context of HTR/OCR groundtruth and HTR-United, and have ourselves a fair amount of data. I wonder if it could be possible to have a ATR (Automatic Text Recognition) or OCR/HTR (Optical Character Recognition / Handwritten Text Recognition) task to register our datasets under, instead of the quite broader Vision to Text, which seems more focused on image-description datasets ? Thanks !

coyotte508 commented 5 months ago

cc @merveenoyan @osanseviero

osanseviero commented 5 months ago

cc @sanchit-gandhi and @Vaibhavs10 for our audio experts :)

Vaibhavs10 commented 5 months ago

This is more vision no?

PonteIneptique commented 5 months ago

This is more Vision than this is Text (although, depending and who you ask...) but I don't think that Multimodal > Vision-to-text is a good match for HTR/OCR/ATR

osanseviero commented 5 months ago

Sorry for my confusion, I read too quickly and did string matching with ASR :smiling_face_with_tear:

Yes, this is indeed vision, In the past, OCR models have been tagged as image-to-text such as in https://huggingface.co/microsoft/trocr-base-handwritten . I think potentially we could keep image-to-text + add a secondary subtype for this use case (either ocr or atr as suggested). WDYT @merveenoyan @NielsRogge @lhoestq ?

lhoestq commented 5 months ago

I'm ok to add a new task_id "ocr" or "optical-character-recognition" under "image-to-text"

merveenoyan commented 5 months ago

I agree with @lhoestq.