laurent22 / joplin

Joplin - the privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android and iOS.
https://joplinapp.org
Other
45.48k stars 4.95k forks source link

Export to pdf doesn't include pdfs #5943

Open Mr-Kanister opened 2 years ago

Mr-Kanister commented 2 years ago

Environment

Joplin version: 2.6.10 Platform: Windows 10 OS specifics: 21H1

Steps to reproduce

  1. Import a pdf File into a Joplin Note
  2. Export this note as pdf file
  3. Open the pdf file
  4. Click on the linked pdf file

Describe what you expected to happen

The linked pdf file can't be opend. I expected that the linked pdf would open up. If you look at the file size of the exported pdf, the linked pdf is there! Also I read, that the pdf gets created from the html and when I export to html instead, I am able to open the linked pdf.

sidkhuntia commented 2 years ago

@Mr-Kanister @laurent22 can i work on it ?

Mr-Kanister commented 2 years ago

Sorry for the late answer. Yes, you can work on it, I can't commit since I do not know this language yet :)

nk183 commented 2 years ago

hi @Mr-Kanister, I am not able to reproduce the bug, I am using Okular (pdf viewer)(in ubuntu) though, It is working fine. which pdf viewer you are using cc @laurent22

kshitij86 commented 2 years ago

I am able to successfully reproduce this on Ubuntu 20.04. Copying the link I can see that the PDF is there, but the viewer is not able to open it (offline/online). I'll try to dig deeper into it.

kshitij86 commented 2 years ago

@Mr-Kanister It'd be really helpful if you could mention the PDF Viewer you are using.

kshitij86 commented 2 years ago

@Mr-Kanister @laurent22 Looks like an issue with how Electron implements printing to PDF really since contents.printToPDF() is called under the hood and everything is handled there so it depends the pdfData returned in the InterOpServiceHelper. Assuming the returned pdfData is not modified after, this seems really an Electron issue more than a Joplin issue. I'll try to find more about the issue and report that soon, maybe with a possible fix (tweaking the options may help maybe?).

nk183 commented 2 years ago

hi @kshitij86, can you specifies the pdf viewer you are using. I really don't think there is any issue with contents.printToPDF() ,I might be wrong, but in different pdf viewer it is behaving differently eg: Okular pdf viewer it is working fine, I guess it depend on PDF Viewer that is is capable to inserting a pdf or not

Mr-Kanister commented 2 years ago

Okay, Hi!

I have now tested it again with three pdf viewers and got partly different results: on Windows 10 with Sumatra PDF v3.3.3 I cannot open the pdf by clicking, but I can right-click and copy a link, enter it in Firefox and so read the file. Exactly the same happens in Okular v20.12.3 under Debian Bullseye. Again on Windows 10, I can click the link directly in the PDF Annotator and am redirected to Firefox. However, when I open the parent pdf directly in Firefox, there is no link to click or copy.

This pdf link to copy is virtually the plain pdf file and in my individual case is about three million characters long. If I want to paste this somewhere, my PC first has to calculate for a very long time and Firefox becomes altogether very jerky. In Kate, I can't even highlight any of this string without my PC having to calculate for ten seconds.

I think it's very likely that this is not the fault of Joplin but of Electron, but still this behaviour is at least annoying....How should we proceed, is this bug report out of place here?

Greetings!

kshitij86 commented 2 years ago

@Mr-Kanister While I think @laurent22 might be able to better guide us on this issue, it works in some means it may also be how the PDF viewers handle opening the file.

adi-uchiha commented 1 year ago

any updates on this issue ?

MukeshKaswan1 commented 1 year ago

Can i work on this project to find and remove some bugs.

roman-r-m commented 1 year ago

Can i work on this project to find and remove some bugs.

You can of course but try to understand and replicate the issue first

7adidaz commented 7 months ago

I was able to regenerate this issue, IMO the most reasonable solution is to make the links appear as hyperlinks, but not clickable, and of course, not include the linked media(file, pdf, images, etc) in the pdf package, what do you think?

laurent22 commented 7 months ago

IMO the most reasonable solution is to make the links appear as hyperlinks, but not clickable

The PDFs are embedded in the document so ideally if you click on the link it would open that embedded PDF. But I don't know if that's even possible. Maybe the task is to investigate first if it can be done at all. If it cannot, then we shouldn't embed the PDFs to begin with, and indeed disable the links.

7adidaz commented 7 months ago

Maybe the task is to investigate first if it can be done at all

I'll be working on investigating this, and when I'm done, I'll do a PR, thanks for your time!

7adidaz commented 7 months ago

Here's what i came up with:

  1. The issue doesn’t originate from Electron’s contents.printToPDF() function. This function simply prints the content of a web page, The real problem arises during the conversion of a note to HTML, which is then used by printToPDF(). This conversion takes a note object and transforms it into HTML, including any attached files as raw data.

  2. Example of Embedded Link in a Note:

    [fileName.txt](:/xxx)

    converted to:

     <a data-from-md="" title="_resources/xxx.txt" href="data:text/plain;base64,/*some data*/" download="xxx.txt">fileName.txt</a>

    The href attribute contains data that, when executed in a new browser tab, opens the original file. This behavior causes issues when exporting to HTML or PDF with large attached files, resulting in very large output files.

  3. One approach to address this is to remove the href attribute if it starts with data:, This won’t affect images since they are rendered using the <img> tag.

Mr-Kanister commented 7 months ago

But this approach then completely removes every attachment (excluding images). I expect PDFs to be included instead! This is supported by PDFs (https://community.adobe.com/t5/acrobat-discussions/embedding-pdf-files-documents-inside-a-adobe-acrobat-pdf/m-p/4674928).

7adidaz commented 7 months ago

It doesn't! here is a PDF result of my approach, i included a normal link, a link to a file, a PDF, and an image

https://drive.google.com/file/d/1eZjRWzpFKmWsoACxVM3yP-_RdN7-2PWp/view?usp=sharing

the original note contained this:

[abdalah_elhdad_resume_go.pdf](:/b7f40e4e8ce646ceb2bc1d12fb3d2a88)

[summary of lasttime debugging.txt](:/29bfdbf0640c41f28893e2e9952b9777)

fsdfsdfsdffff

![mermaid-1710357604361.png](:/37d15b6759b34fa8802a63afa6a7cf96)

[normalLink](https://www.youtube.com/) 

large file 
[userguide.pdf](:/613a3e6e370047acbe41875a0da91b24)
Mr-Kanister commented 7 months ago

...yes and everything but the image and the "normal link" got removed. As a user, I'd want to have the rest included, too.

There must be a way to do so, as it is supported by the pdf format.

laurent22 commented 7 months ago

It doesn't

But we already know that it doesn't - that's the point of this issue. Now, what can we do about it? What did you try to make embedded PDFs work?

If Adobe Acrobat can do it, maybe there's a way to format the HTML or setup Electron to make it work. Or maybe not, but from your comments it sounds like you tried the existing feature, saw that it doesn't work and didn't try much else.

Mr-Kanister commented 7 months ago

from your comments it sounds like you tried the existing feature, saw that it doesn't work and didn't try much else.

If you mean me, then yes, I haven't tried anything else. If there really is no alternative, then of course it's also a bug fix to remove the feature.

laurent22 commented 7 months ago

If you mean me, then yes, I haven't tried anything else

I was actually answering 7adidaz since he's interested in working on this issue.

7adidaz commented 7 months ago

If Adobe Acrobat can do it, maybe there's a way to format the HTML or setup Electron to make it work. Or maybe not, but from your comments it sounds like you tried the existing feature, saw that it doesn't work and didn't try much else.

I have researched the capability of doing it, i.e. a single PDF file, with a hyperlink, when clicked, opens another PDF file. but IMO and based on the research I did, it's not possible to do it outside an environment like Adobe Acrobat.

I have seen the attached Adobe guide on this, when the region or the link is clicked inside Adobe Acrobat, it opens the attachment, outside it... it doesn't as I show in the demo.

2024-03-1518-47-38-ezgif com-crop

what do you think? should we go with the safe route and just disable attachment links as I did in the PR?

Mr-Kanister commented 7 months ago

Just curious: Does it work in Firefox?

With the PDF Annotator (that's paid software, I'm happy to test things and report them, so you don't have to buy it...) I can add attachments to pdfs. Those aren't clickable links, but attachments like E-Mail attachments. In Firefox those get displayed in the sidebar: image

In Dolphin a pop up appears: image

But in SumatraPDF they aren't viewable and Edge isn't displaying them either: https://answers.microsoft.com/en-us/microsoftedge/forum/all/edge-and-pdfs-with-attachements/0d9f4536-6dd7-400c-83f8-1d2066648930

This is the file: Test.pdf

7adidaz commented 7 months ago

The files outputted currently from Joplin don't show the attached media as processable entities, not in Adobe, Firefox, or Evince, here is an example file: export_w_media.pdf .. try to extract the data attached!

But! the files outputted from Adobe show as attachments in both Firefox and Evince, here is a test file: output_from_adobe.pdf

The attached Test.pdf shows the attachment in Firefox, Adobe, and Evince. image

7adidaz commented 7 months ago

@Mr-Kanister I hate mentions, but I updated my comment.. sorry I misunderstood you! :)

Mr-Kanister commented 7 months ago

So quick summary of the current situation:

The first allows to position an area which, when clicked, may (depending on the viewer) guide to this attachment, while the second one only displays them in an "attachment-window" without a positional reference. Both are not supported by all viewers (this was expected by my side as not all viewers display comments/annotations, too).

7adidaz commented 6 months ago

I was researching this, and I found a lib, that can be used to attach files to pdf, what are the policy here about using another package?

roman-r-m commented 6 months ago

I think ideally we should not simply attach files to pdf but make links to those files from within the document work.

Doing so may require rewriting the whole pdf export logic.

7adidaz commented 6 months ago

Can you explain what u mean by links to those files? like the case of Adobe, where when a region is clicked, the linked pdf opens?

roman-r-m commented 6 months ago

In joplin you create a link in your note and clicking on the link opens the document. From a quick glance at the lib that you linked above, it seems to be attaching a file to pdf without creating a link (most likely there is a way to do that as well - that lib seems pretty good)

7adidaz commented 6 months ago

I get you... yep, this will require more work on extracting the PDFs for sure :)

7adidaz commented 6 months ago

I will be applying to GSOC this year, I was interested in the "PDF annotations" so I'll include this in my research, and if I get accepted and I have time at the end of the season, I'll work on it!