opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://opendatalab.com/OpenSourceTools
GNU Affero General Public License v3.0
11.19k stars 835 forks source link

Can I specify page number for output? #480

Open michaelthwan opened 2 weeks ago

michaelthwan commented 2 weeks ago

First thank you for your amazing development. I learnt a lot from your project.

Is your feature request related to a problem? Please describe. 您的特性请求是否与某个问题相关?请描述。

I have books of 300-500 pages, so I need to convert it for a long time. (1h+)

Describe the solution you'd like 描述您期望的解决方案

I know that in initialization stage, you will count page_num Can I specifiy page_num range and only convert that part? like -r 30 51 as page range magic-pdf -p {some_pdf} -r 30 51

Thanks again!

myhloli commented 2 weeks ago

You can open pdf files in chrome or edge,use print->save as pdf->30-50 to cut original pdf as a simple pdf.

michaelthwan commented 2 weeks ago

Got it. If you won't modify, I will use PyPDF2 to split it. Thanks for answering.