opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
https://pdf-extract-kit.readthedocs.io/zh-cn/latest/index.html
GNU Affero General Public License v3.0
5.75k stars 383 forks source link

apache license not compatible with PyMuPDF #119

Open vadim0x60 opened 2 months ago

vadim0x60 commented 2 months ago

MuPDF has a restrictive license (AGPL-3.0) that doesn't allow commercial use. Using pymudpf in a library distributed under Apache license violates it.

wangbinDL commented 2 months ago

Thank you for pointing out this issue. We will update the license accordingly.

vadim0x60 commented 2 months ago

oh no

vadim0x60 commented 2 months ago

PyMuPDF is only used in a small fragment that's easily replaceable by other python PDF libraries. Couldn't we remove that fragment and retain a free license?

wangbinDL commented 2 months ago

Thank you for your suggestion. While it's true that PyMuPDF can be replaced with other Python PDF libraries, our use of the YOLO framework for document detection algorithms, such as formula detection, enforces strict licensing constraints. YOLO is licensed under AGPL-3.0, which requires us to license the entire repository under AGPL-3.0 as well.

Although alternative detection methods could be considered, this would extend our development timeline significantly. We are open to suggestions for more streamlined solutions that comply with a less restrictive license. Your input is greatly appreciated.