Closed Atreyagaurav closed 9 months ago
Looks like it is the catalog. https://docs.rs/pdf/latest/pdf/file/struct.File.html#method.get_root And pagelabels needs to be added there.
I see the catalog, but everything there is Ref
, I can get some Stream
from the root but I want to know how can I get the information from there programmatically. Because it just says Ref
for everything.
Reffile.resolver().get(ref)
I added page_labels to the Catalog.
Ref can be dereferenced with the resolver.
Yes, but I get more Ref
(or PlainRef
), how do I know what kind of data it has and how to convert it into usable data? Debug printing just gives this. Support I want to search for PageLabels manually, looking at object streams, all I get are these. With even if I get inner
from there, I have no idea what data type it's supposed to be.
RcRef { inner: PlainRef { id: 5807, gen: 0 }, data: () }
I added page_labels to the Catalog.
I don't see any commits, where can I try that.
Some examples there could be useful. Getting custom tags from PDF or things like that.
Also, for now I went with poppler-rs
for my program now as it seems to give the page labels, although I had to get it for each page instead of from the document itself.
This is a sample code I tried:
use std::path::PathBuf;
use pdf;
use pdf::object::Resolve;
fn main() {
let path = PathBuf::from("/path/to/slides.pdf");
let file = pdf::file::FileOptions::cached().open(path).unwrap();
println!(
"{:?}",
file.resolver()
.get(file.get_root().metadata.unwrap())
.unwrap()
);
}
Oops. I didn't check the terminal again after hitting return. If you are working with PlainRefs, you just have to fetch them and see what it actually is. Resolver::resolve, I think would be the function to call.
To read the Metadata field, again, resolver::get and then call data() on the stream you got.
Well, the code I added is incorrect.
Yeah, I saw it's added but it doesn't extract the info.
println!("{:#?}", file.get_root().page_labels);
Gives me this:
Some(
NameTree {
limits: None,
node: Intermediate(
[],
),
},
)
It is working as of 5c19ff6a7e040a6a83edcaad5f7110d31705fd20. See the end of examples/names.rs for an example.
Thank you. It works. Looks like beamer page numbers are saved as prefix, so I did something like this:
if let Some(ref labels) = catalog.page_labels {
labels.walk(&resolver, &mut |page: i32, label| {
println!(
"{page} -> {:?}",
label.prefix.as_ref().unwrap().to_string_lossy()
);
});
}
I see that there is a datatype
[PageLabel](https://docs.rs/pdf/latest/pdf/object/struct.PageLabel.html#)
in the library. But I can't figure out any way to read it from PDF. I know the PDF has that, as I can see it if I convert the PDF into text editor friendly format and open it. And also Beamer created pdfs have those.Edit: Also to add more general question, how do I extract data from the Stream.
The case for PageLabels is something like this: