laurent22 / joplin

Joplin - the privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android and iOS.
https://joplinapp.org
Other
46.11k stars 5.02k forks source link

PDF Exports with Images Very Large #7314

Open DaAwesomeP opened 2 years ago

DaAwesomeP commented 2 years ago

I am trying to export a PDF of a note that has two images in it. The images are JPEG and very high resolution (3024x4032), however each is less than 2.4 MB in file size. These photos were taken on a Google Pixel 4a 5G.

I have verified that Joplin copies them with this small file size by right-clicking the images and clicking "reveal in file folder." When I export this note as a PDF, the export is 66 MB. If I try to print to a file using the print menu and the file printer in CUPS, Joplin crashes.

Environment

Joplin version: 2.8.8 Platform: Appimage OS specifics: openSUSE LEAP 15.3 x64

Steps to reproduce

  1. Drag photos into a Joplin note (JPEG, 3024x4032, 2.3MB)
  2. Click export to PDF
  3. Exported PDF is 66 MB and takes some time to export

Describe what you expected to happen

PDF should maintain good JPEG compression of source images.

Logfile

Logs shows loading ntoes and syncing but nothing about export except for this line:

2022-11-17 19:03:21: "CommandService::execute:", "exportPdf", "[]"

I am hesitant to share the log since it contains notes and keys, but I can go through it and redact if necessary.

om2137 commented 1 year ago

I have tried to reproduce this issue. The PDF is not oversized with every JPEG of those dimensions or above, but there are certain JPEGs that cause this issue of oversizing. I have tried 2 JPEGs(4000X6000) of 6MB exported PDF is of 82MB. I have also tried with a different JPEG( 4400X6216 ) of 5.88 MB exported PDF is of 8.89MB. can i work on this issue?

roman-r-m commented 1 year ago

can i work on this issue?

sure

om2137 commented 1 year ago

can i work on this issue?

sure

can you tell, where can I discuss the issue? like discord, forum or else?

roman-r-m commented 1 year ago

Here, on the forum or on discord.

Not sure what there is to discuss though.

DaAwesomeP commented 1 year ago

Here, on the forum or on discord.

Since the issue has been reported here, maybe try to keep the discussion contained here? That way the debugging and solution steps remain well documented and searchable, and eventually all that information will be linkable to a pull request. Personally, I am not in the Discord so you will only be able to get a hold of me here.

Im happy to hear it is reproducible! A good place to start with debugging this would be to see how the PDF gets exported, which tools/libraries are used to do it, and what options/flags that tool might have available. I have yet to look into it all at myself.

om2137 commented 1 year ago

@DaAwesomeP can you confirm if the JPEGs you were using were from DSLR or pro Camera ?

om2137 commented 1 year ago

I have created a topic of this issue on Joplin forum: https://discourse.joplinapp.org/t/pdf-with-jpg-selected-exports-oversized-pdf-github-7314/28419

DaAwesomeP commented 1 year ago

@DaAwesomeP can you confirm if the JPEGs you were using were from DSLR or pro Camera ?

The images came from a Google Pixel 4a 5G. I'm not certain which settings were enabled on the phone.

roman-r-m commented 1 year ago

Might be related: https://bugs.chromium.org/p/chromium/issues/detail?id=801430 @DaAwesomeP do you have any custom css?

DaAwesomeP commented 1 year ago

@roman-r-m No, my Joplin is unmodified and installed via Appimage. That issue seems to suggest that EXIF vs JFIF JPEGs may cause a different result if that bug still exists.

roman-r-m commented 1 year ago

I haven't been able to replicate it so far, so can only guess. Any chance you could share one of those huge pdfs?

om2137 commented 1 year ago

I haven't been able to replicate it so far, so can only guess. Any chance you could share one of those huge pdfs?

The issue do not appear with every jpeg, but only with certain jpeg.

DaAwesomeP commented 1 year ago

@roman-r-m OK, Gist with photos (in Gist instead of attaching here to avoid compression/modification) and exported PDFs here: https://gist.github.com/DaAwesomeP/1e2359f73334471184d670f59ec21abc

I can confirm this is an EXIF issue. If I run exiftool -EXIF= original.jpg on the image first, then the issue goes away and the PDF is the expected size.

In the Gist, the 11.4MB file export_original.pdf is an export of original.jpg. The 2.7 MB file export_stripped.pdf is an export of stripped.jpg. You can see that the export of the file without EXIF data is effectively the same size as the original image, as expected. Note that in this example the PDF did not balloon to 60+ MB as this is a very simple, mostly white background photo that I took for this issue. More complicated photos definitely get much, much larger.

Please excuse my phone not properly rotating/applying metadata to rotate the image.

roman-r-m commented 1 year ago

I can confirm this is an EXIF issue. If I run exiftool -EXIF= original.jpg on the image first, then the issue goes away and the PDF is the expected size.

In this case I'm not sure what can possibly be done on the Joplin side as it relies on Electron/Chrome for creating PDFs.

There was an idea to replace Chrome's built in PDF converter with a 3rd party library but I doubt it's going to be done anytime soon, if at all.

DaAwesomeP commented 1 year ago

In this case I'm not sure what can possibly be done on the Joplin side as it relies on Electron/Chrome for creating PDFs.

As a temporary workaround, there may be a simple way to remove the EXIF data before exporting. I will test more closely and try to figure out exactly which EXIF fields are causing this issue and propose a lightweight solution (obviously don't want to include something as large as ImageMagick). I think it's fine to remove some EXIF data from exported PDFs, as extracting images from PDFs and expecting the same EXIF data is somewhat niche. Chrome may already remove some of the data in the export process.

There was an idea to replace Chrome's built in PDF converter with a 3rd party library but I doubt it's going to be done anytime soon, if at all.

I can potentially look into this too, but this is obviously a much bigger task.

laurent22 commented 1 year ago

Perhaps something to report to the Electron repo? We use webContents.printToPDF() to export to PDF

DaAwesomeP commented 1 year ago

@laurent22 I began to submit an issue just now, but it seems that Electron v19 is EOL. Maybe updating (if possible) would help to resolve the issue?