xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
9.88k stars 582 forks source link

[Document Understanding] Can we support a new task on document understanding? #218

Open jlia0 opened 11 months ago

jlia0 commented 11 months ago

Document Understanding

Some example models:

  1. DiT: https://huggingface.co/microsoft/dit-large
  2. LayoutLMv3: https://huggingface.co/microsoft/layoutlmv3-large
  3. Donut: https://huggingface.co/docs/transformers/model_doc/donut

Reason for request

Document understanding is a very popular task which I couldn't find any supports for the web environment.

Some tasks include:

  1. Key Information Extraction (KIE)
  2. Document Layout Analysis (DLA)
  3. Document Question Answering (DQA)
  4. Optical Character Recognition (OCR)
xenova commented 11 months ago

Those do sound like quite interesting use-cases! Do you mind sharing example code for how you would use the models, as well as the inputs and expected outputs?

jlia0 commented 11 months ago

Here's an example code using detectron2 and DiT on document layout analysis.

DiT Doc: https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/dit HF Space: https://huggingface.co/spaces/imjliao/dit-document-layout-analysis/blob/main/app.py

xenova commented 11 months ago

The repo you shared is private, but I assume I can use this one: https://huggingface.co/spaces/nielsr/dit-document-layout-analysis

jlia0 commented 11 months ago

The repo you shared is private, but I assume I can use this one: https://huggingface.co/spaces/nielsr/dit-document-layout-analysis

Oh yes sorry! I forgot it's my private repo. But you're correct, I am using that one as well.

How do you think we can include this to transformer.js? Seems like there is a dependency issue of detectron2...

xenova commented 11 months ago

Hmm, that might complicate things somewhat... Perhaps there is a JS library out there which is a suitable substitute?

jlia0 commented 11 months ago

Hmm, that might complicate things somewhat... Perhaps there is a JS library out there which is a suitable substitute?

I don't see a JS library out there could do similar stuffs. But I found something that's worth checking out:

https://github.com/Unstructured-IO/unstructured-inference/blob/main/unstructured_inference/models/detectron2onnx.py

^^^ This is a working example of detectron2 using ONNXRuntime...

xenova commented 8 months ago

Just an update on this:

The other tasks (Key Information Extraction and Document Layout Analysis) might be slightly more difficult to add (due the their additional dependencies)... but we'll get there eventually :)