When I try coverting a page that have image to html or xhtml, the image is not included. With this code:
fn main() {
use mupdf::{Document, Page};
use std::fs;
let doc: Document = Document::open("C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\test.epub").unwrap();
let page: Page = doc.load_page(341).unwrap();
let html: String = page.to_html().unwrap();
fs::write("C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\rs-test.html", html);
}
I got this result:
there should be an image above Figure 10.3 text.
I tried to do the same thing in PyMuPDF with this code:
import fitz
doc = fitz.Document('C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\test.epub')
page = doc[331] # the page index is somehow different for the same page I want
html = page.get_text("html")
with open("C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\py-test.html", "w") as file:
file.write(html)
I got this result:
the image is included in base64 format.
I also tried doing the same thing via mutool convert cli, and can get the same result but there's an option that need to be enabled, I dont find anyway to set this thing in to_html method of this crate. The option in mutool look like this:
Text output options:
inhibit-spaces: don't add spaces between gaps in the text
preserve-images: keep images in output
preserve-ligatures: do not expand ligatures into constituent characters
preserve-whitespace: do not convert all whitespace into space characters
preserve-spans: do not merge spans on the same line
dehyphenate: attempt to join up hyphenated words
mediabox-clip=no: include characters outside mediabox
When I try coverting a page that have image to
html
orxhtml
, the image is not included. With this code:I got this result:![image](https://github.com/messense/mupdf-rs/assets/58850480/ebb0479e-041e-4407-b24f-866eccaa9c8e)
there should be an image above
Figure 10.3
text.I tried to do the same thing in
PyMuPDF
with this code:I got this result:![image](https://github.com/messense/mupdf-rs/assets/58850480/a25a6d24-ca78-4342-83c6-5d009211897a)
the image is included in base64 format.
I also tried doing the same thing via
mutool convert
cli, and can get the same result but there's an option that need to be enabled, I dont find anyway to set this thing into_html
method of this crate. The option inmutool
look like this: