lucasrla / remarks

Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
GNU General Public License v3.0
347 stars 20 forks source link

incorrect order of pdf pages with --modified_pdf #24

Closed daleonpz closed 3 years ago

daleonpz commented 3 years ago

It generates the pdf with the annotations but the pages seem to be in any order. But when I use --combined-pdf it works. The annotations are there and the pages are in order. I tried it with two books, one of them is a two columns book and the other one is a one column book.

Check the order of the page indexes:

Book Writers (2017, Createspace Independent Publishing Platform) - libgen.lc.pdf"
PDF in-device directory: .
-------PAGE IDX #104
-------PAGE IDX #114
-------PAGE IDX #8
-------PAGE IDX #132
-------PAGE IDX #26
-------PAGE IDX #43
-------PAGE IDX #79
-------PAGE IDX #88
-------PAGE IDX #14
-------PAGE IDX #107
-------PAGE IDX #119
-------PAGE IDX #115
-------PAGE IDX #52

Probably it should be sorted before saving if we save the order of pages in an array. Maybe something like this:

 pages_order = []
 ....
#  at remarks.py: 180
   if modified_pdf:
        mod_pdf.insertPDF(ann_doc, start_at=-1)
        pages_order.append(page_idx)

# at remark.py: 203
if modified_pdf:
          mod_pdf = _sort_document( mod_pdf, pages_order) 
          mod_pdf.save(f"{output_dir}/{name} _remarks-only.pdf")
          mod_pdf.close()

or put everything together and delete the blank pages after.

for example at remarks.py: 180

 if modified_pdf:
      mod_pdf.insertPDF(ann_doc, start_at=page_idx)

and at remarks.py:203

if modified_pdf:
         l = list(range(mod_pdf.pageCount))          # list of all pages
         for i in l:
                 if not doc.getPageText(i)        # if no text on page number i ...
                            l.remove(i)                   # delete that page from list
          mod_pdf.select(l)                           # select remaining pages from the PDF
          mod_pdf.save(f"{output_dir}/{name} _remarks-only.pdf")
          mod_pdf.close()
lucasrla commented 3 years ago

Hey @daleonpz, thanks for catching and fixing this! I have just merged your PR to master.