Open zemelLeong opened 3 years ago
Constructing PDFs is very much under construction. Take a look at https://github.com/pdf-rs/pdf/blob/master/examples/content/src/main.rs and use page.content instead.
Note that so far no cleanup is done. It just writes another trailer to the existing data.
Hope it can be read a page as a stream.
import asyncio
from PyPDF2 import PdfFileReader, PdfFileWriter
async def sender():
_, writer = await asyncio.open_connection('127.0.0.1', 8888)
old_write = writer.write
writer.length = 0
def write(data):
writer.length += len(data)
old_write(data)
def tell():
return writer.length
writer.tell = tell
writer.write = write
pdf_input = PdfFileReader(open("original.pdf", 'rb'))
pdf_output = PdfFileWriter()
page = pdf_input.getPage(5)
pdf_output.addPage(page)
pdf_output.write(writer)
asyncio.run(sender())
No. PDFs need to be there entirely. Technically there exists an extension that allows processing partial PDFs, but that would require a much more complex architecture.
我可能表达得不准确,我是希望被读取的一页能够转换为字节数组以便于在网络中传输。我在pdf-rs
和pypdf2
中有找到相似的代码。
My expression may not be accurate. I want the page read to be converted into a byte array for transmission over the network. I found similar code in pdf-rs
and pypdf2
.
for now you can add a save_to_vec here: https://github.com/pdf-rs/pdf/blob/master/pdf/src/file.rs#L261
pub fn save_to_vec(&mut self, path: impl AsRef<Path>) -> Result<Vec<u8>> {
self.storage.save(&mut self.trailer)?)
}
Note that the output still contains all original data, so it will not be smaller.
I use this file to test this example. The generated file display is blank. The other files have the same issue.
#[cfg(test)]
mod pdf_test {
use pdf::content::{Op, Point};
use pdf::{build::PageBuilder, content::Content, file::File};
use pdf::build::CatalogBuilder;
macro_rules! file_path {
( $sub_dir:expr ) => { concat!("./src/test/common/", $sub_dir) }
}
macro_rules! run {
($e:expr) => (
match $e {
Ok(v) => v,
Err(e) => {
e.trace();
panic!("{}", e);
}
}
)
}
#[test]
pub fn write_pages() {
let mut file = run!(File::<Vec<u8>>::open(file_path!("xelatex.pdf")));
let mut pages = Vec::new();
for page in file.pages().take(1) {
let page = page.unwrap();
if let Some(ref c) = page.contents {
println!("{:?}", c);
}
let content = Content::from_ops(vec![
Op::MoveTo { p: Point { x: 100., y: 100. } },
Op::LineTo { p: Point { x: 100., y: 200. } },
Op::LineTo { p: Point { x: 200., y: 100. } },
Op::LineTo { p: Point { x: 200., y: 200. } },
Op::Close,
Op::Stroke,
]);
pages.push(PageBuilder::from_content(content));
}
let catalog = CatalogBuilder::from_pages(pages)
.build(&mut file).unwrap();
file.update_catalog(catalog).unwrap();
file.save_to(file_path!("modify.pdf")).unwrap();
}
}
Open modify.pdf
got an error.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Try { file: "pdf\\src\\file.rs", line: 277, column: 23, source: FromPrimitive { typ: "RcRef < Catalog >", field: "
root", source: Try { file: "pdf\\src\\file.rs", line: 94, column: 19, source: FromPrimitive { typ: "PagesRc", field: "pages", source: Try { file: "pdf\\src\\object\\types.rs", line: 90,
column: 20, source: UnexpectedPrimitive { expected: "Reference", found: "Dictionary" } } } } } }', examples\content\src\main.rs:12:49
Yea, I ran into the same problem. This should be fixed now. Try running cargo update
(or git pull
if you have a local repo).
Rewritten content it seems that missing some info.
#[cfg(test)]
mod pdf_test {
use pdf::content::{Op, Point};
use pdf::{build::PageBuilder, content::Content, file::File};
use pdf::build::CatalogBuilder;
macro_rules! file_path {
( $sub_dir:expr ) => { concat!("./src/test/common/", $sub_dir) }
}
macro_rules! run {
($e:expr) => (
match $e {
Ok(v) => v,
Err(e) => {
e.trace();
panic!("{}", e);
}
}
)
}
#[test]
pub fn write_pages() {
let mut file = run!(File::<Vec<u8>>::open(file_path!("xelatex.pdf")));
let mut pages = Vec::new();
// for page in file.pages().take(1) {
// let page = page.unwrap();
// if let Some(ref c) = page.contents {
// println!("{:?}", c);
// }
// let content = Content::from_ops(vec![
// Op::MoveTo { p: Point { x: 100., y: 100. } },
// Op::LineTo { p: Point { x: 100., y: 200. } },
// Op::LineTo { p: Point { x: 200., y: 100. } },
// Op::LineTo { p: Point { x: 200., y: 200. } },
// Op::Close,
// Op::Stroke,
// ]);
// pages.push(PageBuilder::from_content(content));
// }
// for page in file.pages() {
// if let Some(ref contents) = page.unwrap().contents {
// let content = Content::from_ops(contents.operations.to_vec());
// pages.push(PageBuilder::from_content(content));
// }
// }
for page in file.pages().take(2) {
let content = page.unwrap().contents.clone().unwrap();
pages.push(PageBuilder::from_content(content));
}
let catalog = CatalogBuilder::from_pages(pages)
.build(&mut file).unwrap();
file.update_catalog(catalog).unwrap();
file.save_to(file_path!("modify.pdf")).unwrap();
}
}
This method worked.
#[cfg(test)]
mod pdf_test {
use pdf::content::{Op, Point};
use pdf::{build::PageBuilder, content::Content, file::File};
use pdf::build::CatalogBuilder;
macro_rules! file_path {
( $sub_dir:expr ) => { concat!("./src/test/common/", $sub_dir) }
}
macro_rules! run {
($e:expr) => (
match $e {
Ok(v) => v,
Err(e) => {
e.trace();
panic!("{}", e);
}
}
)
}
#[test]
pub fn write_pages() {
let mut file = run!(File::<Vec<u8>>::open(file_path!("xelatex.pdf")));
let mut pages = Vec::new();
for page in file.pages().take(2) {
if let Ok(ref page) = page {
pages.push(PageBuilder::from_page(page).unwrap());
}
}
let catalog = CatalogBuilder::from_pages(pages)
.build(&mut file).unwrap();
file.update_catalog(catalog).unwrap();
file.save_to(file_path!("modify.pdf")).unwrap();
}
}
Like this.