pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.33k stars 509 forks source link

Missing ICC colorspaces lead to unsatisfactory color renderings #397

Closed dalanicolai closed 4 years ago

dalanicolai commented 4 years ago

Converting an pdf to a different format using pymupdf, e.g. writeImage() for pixmap or getSVGimage() for page, results in quite strong deviating colors as when converting with e.g. 'mutool draw' or the 'gs -sDEVICE=pngalpha' command from the command line.

To see what I mean just open this example pdf file in any of the pymupdf based viewers found here and compare the resulting colors to the colors when the same file is opened in any other pdf-viewer.

The deviation is introduced when converting the pdf-file to a different format (as explained above) using pymupdf. It seems that the problem is not mupdf related as converting the pdf-file using 'mutool draw' on the command line results in 'correct' colors.

To conversion with pymupdf result in much too bright colors, while the mutool gives 'good/pleasant' color as shown in the following screenshots:

pymupdf pymupdf (above) mutool (below) mutool

JorjMcKie commented 4 years ago

Confirming your observation. Thanks for submitting this.

Your example uses so-called "color separation" which I have not yet implemented in PyMuPDF, although MuPDF does support this.

Let me check how to integrate this. Any urgency?

dalanicolai commented 4 years ago

Thanks for your quick response, I will search for information about the concept of color separation. For me this bug has no urgency. However, I am developing, more or less, a zathura https://pwmt.org/projects/zathura/ clone in python (with tkinter) as a hobby/practising project (in that respect thanks for the amazing documentation provided with pymupdf). It would be nice if eventually it would render the pdf's with the 'pretty' colors.

My goal is to write an equally amazing pdf reader as zathura, but with support for annotations. It should be more or less a python alternative to the mupdf-gl reader. I enjoy both zathura and mupdf-gl a lot, but I regret that I can not easily hack on them as I am only a beginning (hobbyist python and not c) developer.

Although the repository is private for now, I can give you access to the repository. The code quality is not very high (I am practising) but you might find it an interesting project related to pymupdf. It already has quite some functionality like scrolling, toc, searching, follow links etc. if you find something interesting then you are free to use any part of the code, as I do with yours...

On Sun, 10 Nov 2019 at 11:25, Jorj X. McKie notifications@github.com wrote:

Confirming your observation. Thanks for submitting this.

Your example uses so-called "color separation" which I have not yet implemented in PyMuPDF, although MuPDF does support this.

Let me check how to integrate this. Any urgency?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pymupdf/PyMuPDF/issues/397?email_source=notifications&email_token=AEMTOX2HD223VTZNHJUURQDQS7OP5A5CNFSM4JLKADGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDU2AWA#issuecomment-552181848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMTOX5SHQP2FMYF6KUX5TLQS7OP5ANCNFSM4JLKADGA .

JorjMcKie commented 4 years ago

Thanks for the compliments! You are aware that PyMuPDF supports annotations (for PDF only)? There also is a general document browser based on tkinter using the PySimplGUI repo. "General" here means, that all MuPDF document types are supported - not just PDFs. When creating pixmaps, you can choose to suppress annotations, and annotations themselves support creation of individual pixmaps. You may also be interested having a look at SumatraPDF, a (regrettably Windows-only) document viewer, which is based on MuPDF, but also has an extended document support: more e-book formats, Windows CHM help files, ...

dalanicolai commented 4 years ago

Thanks again. I am aware that PyMuPDF supports (PDF only) annotations. Otherwise I wouldn't have started this project as the goal is to have zathura, written in python and with annotations support. I am aware of the general document browser and SumatraPDF. Indeed SumatraPDF is great too, but again it is not written in python (i.e. not very hackable) and has no support for linux. Anyway, my project is just a nice project to learn how to create and manage larger projects in python. Furthermore, although it doesn't add much to earlier pdf readers, it is easily hackable and I can fully tailor its functionality to my demands (e.g. opening the same book (object) in a second tab or launching it in a second instance with a single shortkey). These are small things but I missed these features on other readers. Also I am not aware of readers that support highlighting using only keystrokes (I implemented this in a 'vimium'-like style), or browsing the toc via index letters (instead of pressing the down arrow 15 times to get to the right line). Again, all small things but nice little details to implement in a hobby project. When pdf-implementation is done I might think of implementing support for other formats too. However for ebooks, I can already use the excellent new calibre e-book reader, so it is not worth the effort to 'redesign the wheel'..

On Sun, 10 Nov 2019 at 14:18, Jorj X. McKie notifications@github.com wrote:

Thanks for the compliments! You are aware that PyMuPDF supports annotations (for PDF only)? There also is a general document browser based on tkinter using the PySimplGUI repo. "General" here means, that all MuPDF document types are supported - not just PDFs. When creating pixmaps, you can choose to suppress annotations, and annotations themselves support creation of individual pixmaps. You may also be interested having a look at SumatraPDF, a (regrettably Windows-only) document viewer, which is based on MuPDF, but also has an extended document support: more e-book formats, Windows CHM help files, ...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pymupdf/PyMuPDF/issues/397?email_source=notifications&email_token=AEMTOX6QWEYYHYM2HA2EMILQTACYTA5CNFSM4JLKADGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDU47YA#issuecomment-552194016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMTOX4VRSJ467I5WF4PGFTQTACYTANCNFSM4JLKADGA .

JorjMcKie commented 4 years ago

I was on the wrong track: It is not missing separations support which cause this effect, but missing support of so-called ICC colorspaces. I ony recently removed this support because it has caused issues in some other cases. To re-activate this support, MuPDF must re-generated with a changed configuration.

JorjMcKie commented 4 years ago

Confirming my previous post: Including ICC support in MuPDF (again) solves your issue.

dalanicolai commented 4 years ago

Thanks for this information and the information in your previous message. It is unfortunate then that the ICC support caused issues. Is it possible for me to (re)enable ICC-support? If so, could you inform me how I could achieve that?

On Sun, 10 Nov 2019 at 21:38, Jorj X. McKie notifications@github.com wrote:

Confirming my previous post: Including ICC support in MuPDF (again) solves your issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pymupdf/PyMuPDF/issues/397?email_source=notifications&email_token=AEMTOX7P46HIHE4QGIUEMVDQTBWMNA5CNFSM4JLKADGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVGETA#issuecomment-552231500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMTOX5TVLFTIVYZD3WDCFDQTBWMNANCNFSM4JLKADGA .

JorjMcKie commented 4 years ago

I am undecided how to handle this. For the time being, I could "privately" send you a wheel if you tell me all your software "coordinates". I want to look look up the other issue with ICC problem again attack it in a way which lets me stay with ICC support included. Just don't know how long this would take me.

JorjMcKie commented 4 years ago

I found issue #188 which states that kicking ICC out of PyMuPDF only has advantages ... You are living proof, that I was wrong then 🙄 So I believe I will just reactivate ICC in version 1.16.8.

JorjMcKie commented 4 years ago

Was too fast: issue #369 reported a serious problem with ICC colorspaces. So I will investigate this case again, before I try to make a decision.

dalanicolai commented 4 years ago

Thanks for the offer but for now I am fine. I can develop the app without the ICC. I just hope that at some point you find a way to support it again. Anyway thanks for the offer and for looking into it.

On Tue, 12 Nov 2019 at 00:48, Jorj X. McKie notifications@github.com wrote:

I am undecided how to handle this. For the time being, I could "privately" send you a wheel if you tell me all your software "coordinates". I want to look look up the other issue with ICC problem again attack it in a way which lets me stay with ICC support included. Just don't know how long this would take me.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pymupdf/PyMuPDF/issues/397?email_source=notifications&email_token=AEMTOXYJ6J5Q7YDZ7FKAPO3QTHVLFA5CNFSM4JLKADGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDYRC3I#issuecomment-552669549, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMTOXY5WU7BKEXEC4HKSHTQTHVLFANCNFSM4JLKADGA .

JorjMcKie commented 4 years ago

@dalanicolai - hope it's good news: The next PyMuPDF version (1.16.12) will allow switching ICC support on or off programmatically. Per default, ICC is enabled. This can be changed at any time using statement

fitz.TOOLS.set_icc(False)  # or True to enable it again.

Accordingly, I will close this issue upon publishing this version.

dalanicolai commented 4 years ago

Hi Jorj!

This is great news of course! Thanks for the work and for updating me about this re-implementation. I guess you can indeed close the issue then.

Cheers!

On Mon, 9 Mar 2020 at 18:44, Jorj X. McKie notifications@github.com wrote:

@dalanicolai https://github.com/dalanicolai - hope it's good news: The next PyMuPDF version (1.16.12) will allow switching ICC support on or off programmatically. Per default, ICC is enabled. This can be changed at any time using statement

fitz.TOOLS:set_icc(False) # or True to enable it again.

Accordingly, I will close this issue upon publishing this version.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pymupdf/PyMuPDF/issues/397?email_source=notifications&email_token=AEMTOX7KNDVR56YX2JOUN4DRGUTHVA5CNFSM4JLKADGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOIJJ4Q#issuecomment-596677874, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMTOX3WRQBCI3RS47RTQ3TRGUTHVANCNFSM4JLKADGA .

JorjMcKie commented 4 years ago

Official version 1.16.12 just published.