Open chuanbei888 opened 3 months ago
https://github.com/opendatalab/magic-doc
it will work at ppt/pptx files
If you want hight quality extract result you should convert ppt to pdf, then use minerU. if you want fast extract speed but do not care extract quality you should choose maic-doc
@chuanbei888 try to convert ppt to pdf with libreoffice
libreoffice --invisible --convert-to docx:'MS Word 2007 XML' /path/to/mydoc.doc --outdir /output/dir
https://github.com/opendatalab/magic-doc it will work at ppt/pptx files
If you want hight quality extract result you should convert ppt to pdf, then use minerU. if you want fast extract speed but do not care extract quality you should choose maic-doc
Okay, I will have a try.
请教一下,对于ppt和docx转markdown的方案选择上,转成pdf再用magic-pdf 和 直接用magic-doc 这两个方案哪个效果更佳?
先转pdf再转md,会不会导致部分文字的识别 不如直接读取的好?
请教一下,对于ppt和docx转markdown的方案选择上,转成pdf再用magic-pdf 和 直接用magic-doc 这两个方案哪个效果更佳?
先转pdf再转md,会不会导致部分文字的识别 不如直接读取的好?
magic-doc文本提取能力强,速度更快,但是最终输出是不包含任何图片的。 转pdf之后使用magic-pdf提取,可以实现较好的图片排版效果,缺点是速度较慢。
docx转pdf有没有批量的工具
@zouhuigang liberoffice
Any tool you recommend that convert ppt to pdf?
https://github.com/opendatalab/magic-doc
it will work at ppt/pptx files