[Status] Version 4.0 - Githubissues

Development is slow due to private matters. But the project is still alive. The development can be followed in the pull request https://github.com/lublak/pdfdataextract/pull/9. This is a very large function update. It is also very complete because all functions that are called internally by pdfjs have to be analysed and the possibilities of the contents of the function are reproduced in a structure. This requires reading a lot of internal source code of the pdfjs library. With this, all possible data from a PDF file can be read out comfortably by pdfjs. In addition, the latest version of all libraries is set here. There is currently a breaking change in this pull request due to the new version of pdfjs (https://github.com/mozilla/pdf.js/pull/14527). Two possibilities are currently on the list. The first would be a breaking change also in this library in version 4.0. The second possibility would be to use the own implementation of the content extraction. Whether this makes it possible to restore the old state is still uncertain and must be tested after the completion of this function.

lublak / pdfdataextract

[Status] Version 4.0 #10