Open lszgit241 opened 8 months ago
Hi there! 👋 We do support a collection of OCR models (the TrOCR model family). For example, https://huggingface.co/Xenova/trocr-small-handwritten can be used as follows:
import { pipeline } from '@xenova/transformers';
// Create image-to-text pipeline
const captioner = await pipeline('image-to-text', 'Xenova/trocr-small-handwritten');
// Perform optical character recognition
const image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/handwriting.jpg';
const output = await captioner(image);
// [{ generated_text: 'Mr. Brown commented icily.' }]
However, as you state in your question, you are specifically looking for a model which performs Chinese optical character recognition. Do you know of a model (perhaps on the Hugging Face Hub) which you would like to run in the browser? If so, we can convert the model to ONNX so that it can run with transformers.js.
Hi there! I deleted all the commet before, because i noticed that OCR Model is not as easy as what i think a few days before.
After a lot of gitlab OCR Models code reading and models knowlege learning by these days, i noticed that a OCR Model which can detect all the text blocks area and recognize each block of texts excactly in this image needs two model work together at least ( detect model and recognize model, maybe also need angle detect model and so on), and there should be a lot of works to glue them together with js to generate the usedful outputs, it's obviously not a easy work, converting the PaddleOCR now will late for my business researching for sure...
But to be honest, this kind of OCR workflow with in good performance, might be a very big issue for internet information publishing in China's enviroment ( just like input an image data, output all the text block positions and text charctors it self ),a lot of information companys requires this workflow and almost all using server to predict, and this is also a big advantage of tansformer.js, if you consider to conver this process to transformer.js, i think i could join it together, because i will focus on Web AI for a long time in my work ( ps: I do convert the model and run it in onnxruntime, but i'm not very good at ai's workding now.... )
Thanks for your response again! And wishing for your next response~
Hi again 👋 Apologies for the late response! The transformers.js library aims to bring the python transformers library to the browser and other JavaScript environments (e.g., Node, deno), so if you know of a suitable OCR model on the Hugging Face Hub (link) which is compatible with the python library, we can do our best to try support it.
I do agree that OCR is a very powerful use-case, and I look forward to adding support for new OCR models when they release.
This might soon be possible: https://github.com/VikParuchuri/surya
@xenova happy to help with the integration of the surya model, lmk what needs to be done. Should i just follow the instructions in the readme to add a new model?
@xenova happy to help with the integration of the surya model, lmk what needs to be done. Should i just follow the instructions in the readme to add a new model?
I'd also be interested in surya in the browser, is this underway?
Feature request
Hi, I'm one of the chinese developers that knows transformer.js by a Front End salon than invited transformer.js engineer to talk about this lib. Now i'm going to work with client OCR, and i'm plan to use transformer.js, but i'm not seen any models of ocr is supported, i want to know is this right time to use transformer.js of ocr functions?
Motivation
OCR Models that supported chinese
Your contribution
not yet