Direct page objects in /Kids

GoogleCodeExporter commented 9 years ago

In examples where XObjects are used, after adding new pages, somehow they are 
written in /Kids array as direct objects. According to specification, they must 
be indirect. Although pdf readers open such documents just fine, some tools are 
complaining about that. The solutions can be:

1) in examples (e.g. 4up.py function get4) change returning type from PdfDict 
to IndirectPdfDict.

2) changing type to indirect in writer. For example, in _get_trailer:

        # Make all the pages point back to the page dictionary
        pagedict = trailer.Root.Pages
        for page in pagedict.Kids:
            page.Parent = pagedict
            page.indirect = True  <-- add this line

I think the second approach is more cleaner.

Original issue reported on code.google.com by exp...@gmail.com on 17 Nov 2012 at 3:56

GoogleCodeExporter commented 9 years ago

That's your second great bug report with good proposed fix!

You seem to have a good grasp of both the problem domain (PDF files) and the 
implementing code.  Would you like to join the project and have write access to 
the repository?

Thanks,
Pat

Original comment by pmaupin on 17 Nov 2012 at 4:01

GoogleCodeExporter commented 9 years ago

Working with a large amount of various pdfs is my primary job. I think it would 
be great.

Original comment by exp...@gmail.com on 17 Nov 2012 at 4:49

GoogleCodeExporter commented 9 years ago

OK, I think I have added you (if I got your email right).  You can verify that 
you can check in these changes -- I'll be travelling tomorrow, so if you could 
do that today, it would be great.

Also, I have a vision for what I would like to build (but no time at the 
moment).  The doc is a bit rough, but it might give you an idea:

http://code.google.com/p/pdfrw/source/browse/branches/develop/tools/pdfrw.txt

Basically, a command line tool that will do everything the current examples 
will and more...

Thanks,
Pat

Original comment by pmaupin on 17 Nov 2012 at 5:04

GoogleCodeExporter commented 9 years ago

Thank you, Pat, after reading your idea, I have to say, that I already made a 
little project using pdfrw in some case similar, but a little bit different. My 
idea was to make simple script parser with the same functionality as yours but, 
working on large amount of documents residing in different folders, not only 
specific documents. The script is loaded from script file and then produces 
output.

Example of working script file:

input.read(d:\Works\Company\data\2012 10 24\catalog.pdf)

#input.use()
input2 = input.crop(-8, -8)
input2.rotate(270)
#input2.multiply(4)
input2.nup(2, 2, SRA3)
input2.write(d:\Works\Company\data\2012 10 24\catalog_SRA3.pdf)

input.resize(-50, -100)
input.rotate(180)
input2.rotate(90)
input2.reverse()
input.insert(input2, 2)

#foregrounds = input.use(1, 3, ...)
#backgrounds = input.use(2, 4, ...)
#backgrounds.rotate(180)  # rotate for duplex
#pages = foregrounds.insert(backgrounds, every_second)
#pages.multiply(4)  # multiply by 4 pages
#pages.nup(2, 2, sra3)

input.write(d:\Works\Company\data\2012 10 24\catalog2_SRA3.pdf)

In this example it works with specific document, but actually it can work with 
the whole folders. Now I need to rest a little bit, but after holidays I will 
send you my project. If you will like it, then we can think about it and make 
the addition to pdfrw or create and publish it separately.

Nerijus.

Original comment by exp...@gmail.com on 17 Nov 2012 at 5:29

GoogleCodeExporter commented 9 years ago

I sent a reply to your email address, but didn't think to ask if you checked 
that account.

Thanks,
Pat

Original comment by pmaupin on 17 Nov 2012 at 6:04

ralsina / pdfrw

Direct page objects in /Kids #9