jwilk-archive / pdf2djvu

PDF to DjVu converter
GNU General Public License v2.0
94 stars 17 forks source link

Duplicate page title: 1 #113

Closed jwilk closed 7 years ago

jwilk commented 9 years ago

Issue reported by @juanfra684:

$ pdf2djvu 2012.05.23.a4.pdf -o output.djvu -v
Warning: /tmp//pdf2djvu.8IFFYL: Directory not empty
Duplicate page title: 1
$ pdf2djvu 2012.05.23.a4.pdf -o output.djvu
Warning: /tmp//pdf2djvu.Vwtqzo: Directory not empty
Duplicate page title: 1

pdf2djvu 0.9.2 on OpenBSD-current/amd64:

I've attached the problematic pdf file. It works fine with pdf2djvu-0.8.1.


Attachment: 2012.05.23.a4.pdf

jwilk commented 9 years ago

This is a consequence of fixing issue #109.

In the attached document, pages 1 and 7 have both label “1”, which translates to the same title; and pdf2djvu won't let you have two pages with the same title.

You can use --no-page-titles to bring the old behavior back.

While the current behavior is intentional, I appreciate it's not really user-friendly. I'll think what to do about it.

jwilk commented 9 years ago

Warning: /tmp//pdf2djvu.8IFFYL: Directory not empty

I've opened a separate bug about this warning: #114

jwilk commented 9 years ago

Comment submitted by @juanfra684:

If pdf2djvu can't convert a pdf file, then it should go back automatically to the old behavior. You could add a warning explaining the problem at the end of the output of the conversion.

jwilk commented 8 years ago

My first thought was to just allow duplicates. Unfortunately, DjVuLibre doesn't like documents with duplicate titles; you get a fatal error:

Error in 'DIRM' chunk: two records for the same TITLE '1'

(NB, this is unintentional and already fixed in DjVuLibre VCS.)

Falling back to --no-page-titles where there's a duplicate would be probably too suprising.

What I'm contemplating is to simply leave the page without title if there was a prior page with the same title. It's a bit ugly, though…

Alternatively (or additionaly), I could make --no-page-titles the default for all documents.

jwilk commented 8 years ago

Comment submitted by @juanfra684:

Probably making --no-page-titles the default is the simpler workaround for this bug from an user point of view. Hopefully, I will commit the djvulibre patch to the OpenBSD port in the next days, so the problem are the other OS :)

jwilk commented 8 years ago

Comment submitted by @juanfra684:

I committed the patches to the djvulibre OpenBSD port. I still see the message Duplicate page title: 1. I know that you can't disable that check because the most of the other OS don't include a patched version of djvulibre, but I would like a workaround (if you can) for OpenBSD.

jwilk commented 8 years ago

To clarify, the DjVuLibre patch is need to open a DjVu file with duplicate titles.

It would be awful if OpenBSD's pdf2djvu produced DjVu files that can't be opened on most other systems.

jwilk commented 8 years ago

Comment submitted by @juanfra684:

Thanks for the clarification. I thought that the problem was to generate djvu files with duplicate page titles.

jwilk commented 7 years ago

What I'm contemplating is to simply leave the page without title if there was a prior page with the same title.

This has been implemented in 1b262b90854cd3d5359cd7fdf6b72642b54c8f60.

jwilk commented 7 years ago

Fixed in 0.9.5.