Open PonteIneptique opened 5 months ago
cc @merveenoyan @osanseviero
cc @sanchit-gandhi and @Vaibhavs10 for our audio experts :)
This is more vision no?
This is more Vision than this is Text (although, depending and who you ask...) but I don't think that Multimodal > Vision-to-text
is a good match for HTR/OCR/ATR
Sorry for my confusion, I read too quickly and did string matching with ASR :smiling_face_with_tear:
Yes, this is indeed vision, In the past, OCR models have been tagged as image-to-text
such as in https://huggingface.co/microsoft/trocr-base-handwritten . I think potentially we could keep image-to-text
+ add a secondary subtype for this use case (either ocr or atr as suggested). WDYT @merveenoyan @NielsRogge @lhoestq ?
I'm ok to add a new task_id
"ocr" or "optical-character-recognition" under "image-to-text"
I agree with @lhoestq.
Hi :) We are in the process of working a pipeline to help people publish their data to huggingface in the context of HTR/OCR groundtruth and HTR-United, and have ourselves a fair amount of data. I wonder if it could be possible to have a ATR (Automatic Text Recognition) or OCR/HTR (Optical Character Recognition / Handwritten Text Recognition) task to register our datasets under, instead of the quite broader Vision to Text, which seems more focused on image-description datasets ? Thanks !