typst / pdf-writer

A step-by-step PDF writer.
Apache License 2.0
486 stars 26 forks source link

How to add an image to pdf ? #5

Closed akhilmhdh closed 2 years ago

akhilmhdh commented 2 years ago

Hey there,

First of all thank you for creating this crate. I was using this with svg2pdf to generate images for a personal project and its been great.

But due to sv2pdf rasterizing the file size seems to be soo big. So planning to change to png image by image crate . Can you please share some insights on how to add a png to pdf.

I saw there is image x_objects. But can't seem to understand on how to position the elements and on how to also use it properly. Has been reading the source code for quite sometime.

let id = Ref::new(5);

let sample  = image::from("/file.png");
 let image = writer.image_xobject(id, sample);
 image.finish();

By reading some of the dependent codebase came to this. But still unsure like how to place an image at a particular location in a page.

Would be great if you could add an example.

Thank you

akhilmhdh commented 2 years ago

Also sorry, this is not an issue but rather a doubt, Didn't find the discussion tab as I think its disabled.

laurmaedje commented 2 years ago

I put together an example on how to embed and position PNG and JPEG images. Hope it helps :)

I'm wondering though what you mean by svg2pdf rasterizing something. It shouldn't rasterize anything and if you reference raster images from your SVG, it should also compress them. It would be great to know what's going on there, so that if there's actually a problem, we can fix it!

akhilmhdh commented 2 years ago

Wow. That's so kind of you. Thank you so much. It's really helpful.

Yes. Actually what i am trying to do is a qr generation.

  1. So what i did was first build a nested svg which contains all qr code for a page.
  2. Then using svg2pdf converted to x_object.
  3. Then i followed svg2pdf example on adding to an existing pdf using pdfwriter.
  4. Then finally saved the pdf.

I didn't use png because svg generation for qr was crazy fast than encoding. But the issue that i face is even though my throughput is high the file size is also crazy high. I think its because of xobject. I applied the compression technique said in the doc using deflate but was no avail.

So i think its because svg2pdf quality is too high for me

reknih commented 2 years ago

Hey, I'm the primary author of typst/svg2pdf. Just like an SVG is a vector graphic specifying drawing commands (draw a line from here to there, fill this shape with orange, ...) the XObject you'll receive from svg2pdf will also contain a content stream with drawing commands instead of a pixel raster graphic.

The problem of why that's so large right now for some files is that the content stream is not compressed and instead written to the file as-is. This can produce large files, especially for complicated paths. The long term solution here would be to make svg2pdf compress its content streams with /Filter FlateDecode. I'll put it on the roadmap.

akhilmhdh commented 2 years ago

Hey @reknih .

Yup. Agreed. But weird part for me here is, I did apply compression using flate2 crate

    let final_svg ="svg string"

    let tree = Tree::from_str(&final_svg, &usvg::Options::default().to_ref()).unwrap();
    svg2pdf::convert_tree_into(&tree, svg2pdf::Options::default(), &mut writer, svg_id);

    let mut content = Content::new();
    content
        .transform([page_w, 0.0, 0.0, page_h, 00.0, 0.0])
        .x_object(svg_name);

    let data = content.finish();
    println!("Raw: {:?}", data.len());

    let mut encoder = ZlibEncoder::new(Vec::new(), Compression::best());
    encoder.write_all(&data.as_slice()).unwrap();
    let compressed = encoder.finish().unwrap();
    println!("Compressed : {:?}", compressed.len());

    writer
        .stream(content_id, &compressed)
        .filter(Filter::FlateDecode);

Just a snippet of the compression part. This is the example of sv2pdf with pdfwriter. Just added compression.

Btw thank you svg2pdf also. Really helpful, given its speed, quite fast than other kinda of encoding.

Also. I used miniz_oxide crate at first. Then switched to flate2 crate. Both had no effect sadly

reknih commented 2 years ago

The problem here is that once you call svg2pdf::convert_tree_into, the whole graphic is written to the writer's buffer/file (uncompressed). Your content stream (content) then just contains two instructions, the transform matrix and a reference to the graphic converted above. It does not contain the graphic data, so it is quite small and compression of that won't do much, no matter which crate you use.

svg2pdf has already written the data you do want to compress into pdf-writer's buffer before you create and compress your own content stream and there is no way for you to retrieve and compress it, unfortunately.

As an example to make it a bit clearer: If you were to use the graphic a second time in this or another content stream with another XObject command, you would not find it twice in the resulting file. You'd just find multiple references to its name.

akhilmhdh commented 2 years ago

Got it.

But i did another thing though. I used lopdf crate and after writer.finish, i loaded it into lopdf and then applied document.compress method.

It reduced 1.5gb file to 200mb size but took quite some time. The time is taken because writer.finish ended up as 1.5gb.

Was thinking can i do before that? Like may be iterate though the pdf and encode it with a compression. But didn't find a method like that.

Same effect was when i zipped the pdf. Reduced drastically but wont solve the actual pdf size though.

I think i may have caused some confusion. 😅

akhilmhdh commented 2 years ago

@reknih @laurmaedje

Just another question, is there any relationship between page_id and content id other than being unique i32.

I am trying to create dynamically. But unique caused writer.finish in stuck. So ended up giving really big values for content id like 5000 * i then writer started writing smooth.

So i was thinking there must be a relation between page_id and content_id in a pdf. I read the PDF spec but found out that only criteria is uniqueness. Is there anything I am missing

laurmaedje commented 2 years ago

There is no relationship between page ids and content ids. I think your problem is that you have duplicate ids in your document. svg2pdf often needs multiple indirect objects to recreate the SVG. That's way convert_tree_into takes and returns a Ref. You say which id it should start using, it uses as many ids as necessary and returns the next free id back for you. Multiplying by 5000 fixes the issue because svg2pdf doesn't need 5000 objects in this case.

pdf-writer got stuck because the code that creates the xref table failed to handle duplicate ids. I changed it so that it will now panic with a descriptive message and the duplicate id instead.

akhilmhdh commented 2 years ago

Got it. So only criteria is unique id. Yah firing error would be really helpful. I guess this cleared all my doubts regarding this issue. Ill update my crate when new release comes up with these changes.

Thank you for taking immediate action. Have a nice day