xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
10.85k stars 658 forks source link

OCR Models #452

Open lszgit241 opened 8 months ago

lszgit241 commented 8 months ago

Feature request

Hi, I'm one of the chinese developers that knows transformer.js by a Front End salon than invited transformer.js engineer to talk about this lib. Now i'm going to work with client OCR, and i'm plan to use transformer.js, but i'm not seen any models of ocr is supported, i want to know is this right time to use transformer.js of ocr functions?

Motivation

OCR Models that supported chinese

Your contribution

not yet

xenova commented 8 months ago

Hi there! 👋 We do support a collection of OCR models (the TrOCR model family). For example, https://huggingface.co/Xenova/trocr-small-handwritten can be used as follows:

import { pipeline } from '@xenova/transformers';

// Create image-to-text pipeline
const captioner = await pipeline('image-to-text', 'Xenova/trocr-small-handwritten');

// Perform optical character recognition
const image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/handwriting.jpg';
const output = await captioner(image);
// [{ generated_text: 'Mr. Brown commented icily.' }]

image

However, as you state in your question, you are specifically looking for a model which performs Chinese optical character recognition. Do you know of a model (perhaps on the Hugging Face Hub) which you would like to run in the browser? If so, we can convert the model to ONNX so that it can run with transformers.js.

lszgit241 commented 8 months ago

Hi there! I deleted all the commet before, because i noticed that OCR Model is not as easy as what i think a few days before.

After a lot of gitlab OCR Models code reading and models knowlege learning by these days, i noticed that a OCR Model which can detect all the text blocks area and recognize each block of texts excactly in this image needs two model work together at least ( detect model and recognize model, maybe also need angle detect model and so on), and there should be a lot of works to glue them together with js to generate the usedful outputs, it's obviously not a easy work, converting the PaddleOCR now will late for my business researching for sure...

But to be honest, this kind of OCR workflow with in good performance, might be a very big issue for internet information publishing in China's enviroment ( just like input an image data, output all the text block positions and text charctors it self ),a lot of information companys requires this workflow and almost all using server to predict, and this is also a big advantage of tansformer.js, if you consider to conver this process to transformer.js, i think i could join it together, because i will focus on Web AI for a long time in my work ( ps: I do convert the model and run it in onnxruntime, but i'm not very good at ai's workding now.... )

Thanks for your response again! And wishing for your next response~

xenova commented 8 months ago

Hi again 👋 Apologies for the late response! The transformers.js library aims to bring the python transformers library to the browser and other JavaScript environments (e.g., Node, deno), so if you know of a suitable OCR model on the Hugging Face Hub (link) which is compatible with the python library, we can do our best to try support it.

I do agree that OCR is a very powerful use-case, and I look forward to adding support for new OCR models when they release.

xenova commented 7 months ago

This might soon be possible: https://github.com/VikParuchuri/surya

vinayakathavale commented 7 months ago

@xenova happy to help with the integration of the surya model, lmk what needs to be done. Should i just follow the instructions in the readme to add a new model?

bobbydmartino commented 1 month ago

@xenova happy to help with the integration of the surya model, lmk what needs to be done. Should i just follow the instructions in the readme to add a new model?

I'd also be interested in surya in the browser, is this underway?