opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
https://pdf-extract-kit.readthedocs.io/zh-cn/latest/index.html
GNU Affero General Public License v3.0
5.27k stars 356 forks source link

feat: add batch-size parameter and garbage collection #89

Closed jorgeolothar closed 2 months ago

jorgeolothar commented 2 months ago

GPU memory use in the layout and formula detection loop increases without being released. This can cause out of CUDA memory errors in large documents. Added garbage collection after each image prediction.

Also added a batch-size parameter, to give some command line flexibility for GPUs with smaller RAM sizes.

wangbinDL commented 2 months ago

LGTM