metanorma / metanorma-iso

Metanorma processor for ISO standards
BSD 2-Clause "Simplified" License
14 stars 5 forks source link

(URGENT) An attachment doesn’t work in a browser. #1203

Closed TRThurman closed 2 months ago

TRThurman commented 2 months ago

I am trying to build the express annotation report and have a corrupt PDF file.

in bash: cd /Users/tom/test_git/metanorma/annotated-express-report/sources/report-pdf-latex-validation bash-5.2$ bundle exec metanorma sources/report-pdf-latex-validation/document.adoc

open document.html try to open the attachment in part 50:

PastedGraphic-1

running qpdf --check 10303-50/READY-20230316-no-toc-iso-10303-50.pdf results in lots of warnings.

Sometimes a link to a file doesn't work either. See repo: annotated-express-report branch pdf-latex-validation-report.

This is time-critical for a deliverable.

opoudjis commented 2 months ago

See also https://github.com/metanorma/annotated-express-report/issues/18

Given this is going to involve coordinating between @Intelligent2013 and myself, I can't guarantee a solution tonight, but we will see what we can do.

I need precise details of where this document is: is this https://github.com/metanorma/iso-10303/tree/main/documents/iso-10303-50 ?

... n/m: annotated-express-report/sources/report-pdf-latex-validation, same document as before

TRThurman commented 2 months ago

For that case, I was able to work around the issue by using a link rather than the attachment. When I opened the file in _attachments adobe claimed the file was corrupt. Opening the file in the source subdirectory was ok.

There is another issue I submitted about the fact that links don't always work. [metanorma/annotated-express-report] (URGENT) link failure in iso 10303-110 clause (Issue #18)

TRThurman commented 2 months ago

Quite often the source encoding of the link omits the directory name. I am working my way through that. That is a separate source quality issue but wanted to mention I am aware of it.

Intelligent2013 commented 2 months ago

@opoudjis FYI, we had the similar issue in https://github.com/metanorma/metanorma-standoc/issues/900 already. May be the branch https://github.com/metanorma/metanorma-standoc/tree/fix/attachment-localdir didn't merged into the main?

opoudjis commented 2 months ago

Damn, that would indeed be it. Sorry.

opoudjis commented 2 months ago

Yeah, I see what has happened: this has clashed with another fix to carriage returns in XML.

opoudjis commented 2 months ago

I have to go out in a bit, but once I compile the document successfully, @TRThurman, I will upload it to Mega.nz for you to look at.

Just confirmed that READY-20230316-no-toc-iso-10303-50.pdf is no longer corrupt.

opoudjis commented 2 months ago

Forward you link in email, @TRThurman , please confirm it is working in order.

TRThurman commented 2 months ago

@opoudjis I didn't (yet) get an email with a link.

opoudjis commented 2 months ago

I just sent it 5 mins ago, I had hit send but my mail server hadn't responded before I went on my visit.

TRThurman commented 2 months ago

I have not received an email. Tom

On Aug 15, 2024, at 7:58 AM, Nick Nicholas @.***> wrote:

I just sent it 5 mins ago, I had hit send but my mail server hadn't responded before I went on my visit.

— Reply to this email directly, view it on GitHub https://github.com/metanorma/metanorma-iso/issues/1203#issuecomment-2291220579, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMMKVGH4PD4NDSP35DMKW3ZRSQXDAVCNFSM6AAAAABMROMUNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJRGIZDANJXHE. You are receiving this because you were mentioned.

TRThurman commented 2 months ago

Got it. Tom

On Aug 15, 2024, at 9:59 AM, Nick Nicholas @.***> wrote:

https://mega.nz/file/Um1RFAgT#V_VRjjQMpS1PmzO6w8kP_Xa897b3A-HLMpCR93FiW4A . I'll delete when you confirm receipt.

— Reply to this email directly, view it on GitHub https://github.com/metanorma/metanorma-iso/issues/1203#issuecomment-2291471765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMMKVAXZ3YZM77VQK7LEP3ZRS65VAVCNFSM6AAAAABMROMUNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJRGQ3TCNZWGU. You are receiving this because you were mentioned.

opoudjis commented 2 months ago

@TRThurman Is it working to your satisfaction?

TRThurman commented 2 months ago

It seems to. I am debugging links. Is there some documentation that describes the difference in behavior in metanorma between attachments and links? With recommendations when to use which? Cost trade in time/space when the linked file is local in the directory tree? Our document has both.

TRThurman commented 2 months ago

It seems to be working. I will zip up my local build and make it available to the team for checks. ISO has a limit on upload size, which is why I asked the question about attachments vs links. Anyway, I will upload it in however many pieces it takes.

opoudjis commented 2 months ago

It seems to. I am debugging links. Is there some documentation that describes the difference in behavior in metanorma between attachments and links? With recommendations when to use which? Cost trade in time/space when the linked file is local in the directory tree? Our document has both.

The current full extent of documentation of attachments is https://www.metanorma.org/author/topics/sections/attachments/

The only benefits of attachments are that they are embedded in the XML document, which means they can be distributed together as a bundle with no external dependency; and they can be embedded in PDFs.

Those are the only advantages of them. If you can reliably host the linked documents externally, you should: as you are seeing, attachments are also blowing up document size and therefore processing time and space. While the current document is getting away with embedding 100 MB of PDFs inside a master PDF (with a lot of debugging our side, to prevent running out of memory), I'm having trouble seeing that as advisable.

I can't give you a target file size to avoid, because that does seem to be a PDF processing constraint matter. But I will say that I've had to change code to get the XML to deal with 10 MB attachments (introducing linebreaks in the Bin64 encoding), and the PDF is generated by XSLT, with all that implies. (The HTML generations merely exports the attachments to a folder from the Metanorma XML on the first opportunity.)

My guess is, that the sample document with 100 MB of attachments you have is at the outer limit of what is doable; and that PDF processing is more constrained than XML or HTML processing.