naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
35.28k stars 2.23k forks source link

Older Version of File-type library is used, which is causing EOL vulnerability #711

Closed anshulgupta8392 closed 1 year ago

anshulgupta8392 commented 1 year ago

Describe the bug Older Version of File-type library is used

I am using the latest version of file-type in my project which is hampering the execution of tesseract createWorker function.

Screen Shot 2023-01-27 at 11 31 13 PM
OleksiiHryhorian commented 1 year ago

Faced the issue as well. I have a SNYK test for all added packages to my solution and snyk report shows the tesseract.js includes outdated and vulnerable library file-type 12.4.2: image

image

Would it be possible to resolve this issue please?

mtica commented 1 year ago

+1

OleksiiHryhorian commented 1 year ago

Adding the link to the comment from Closed thread (https://github.com/naptha/tesseract.js/issues/679) as it's connected and issue wasn't solved so far: https://github.com/naptha/tesseract.js/issues/679#issuecomment-1362865108

Balearica commented 1 year ago

Would be ideal if a user impacted by this issue could contribute a PR. I will not have time to develop Tesseract.js in the near future.

Balearica commented 1 year ago

I looked into this tonight, and this dependency is quite the headache--I am leaning towards cutting altogether.

  1. The latest versions (>=17) are ESM only, so will not work with our build.
  2. The bug is also patched in v16.5.4, however that version has separate exports for the Node.js and browser versions, so would require workarounds to run in Tesseract.js (which requires both).
  3. When I got a browser-only version running with v16.5.4, I found this update over doubled the size of our worker code, which I do not consider an acceptable tradeoff a. worker.min.js went from 145.1kB to 297.0kB [+105%]

Rather than work on this further, I think I am going to cut this dependency in the next version. We currently use file-type to (1) detect whether a buffer contains a .gz file and (2) detect whether a buffer contains a .bmp file. Figuring out how to do those things from scratch is almost certainly easier than continuing to fiddle with this dependency.

Balearica commented 1 year ago

This dependency has been removed in the master branch in #775 for the reasons stated above. This change will be reflected in the next npm release, which will be version 4.1.0.

Balearica commented 1 year ago

This dependency was removed in the v4.1.0 release.

anshulgupta8392 commented 1 year ago

Thanks a lot Balearic

On Sat, 3 Jun 2023, 6:30 am Balearica, @.***> wrote:

Closed #711 https://github.com/naptha/tesseract.js/issues/711 as completed.

— Reply to this email directly, view it on GitHub https://github.com/naptha/tesseract.js/issues/711#event-9419125106, or unsubscribe https://github.com/notifications/unsubscribe-auth/A22JB2CF5GBE5KLGA6UFMITXJKECJANCNFSM6AAAAAAUI667JA . You are receiving this because you authored the thread.Message ID: @.***>