metanorma / metanorma-standoc

Metanorma for Standoc documents
BSD 2-Clause "Simplified" License
5 stars 2 forks source link

Implement per-document attachments #709

Closed ronaldtse closed 5 months ago

ronaldtse commented 1 year ago

Increasingly organizations are issuing PDFs that contain attachments in order to incorporate machine-readable content.

Other than the ISO 10303 series, the BIPM's many NMLs (national metrology laboratories) have begun issuing digital calibration certificates in PDF form with machine-readable attachments, such as XML data.

In the recent APMP DXFG webinar: https://apmp-dxfg.org/dxfg-eoy-webinar-2022.html

Japan's NMIJ provided a preview of their digital calibration certificate that contains multiple attachments:

Screen Shot 2022-12-05 at 9 25 51 AM

Metanorma today supports per-collection attachments, but creating a collection is a complex ordeal and requires the user to understand many things (e.g. manifest file, cover page, collection metadata).

We need to support per-document attachments so that users that only deal with one document file can easily incorporate and link to the incorporated attachments as native document elements.

ronaldtse commented 1 year ago

This issue is needed for:

opoudjis commented 1 year ago

I really really really want to hand this work over to someone else, so @abunashir, this is my thinking in attachments in general.

In Metanorma Collections, a single attachment can be referenced by multiple files in the collection, so I think the approach I took there is the correct one:

How to do all this is more or less documented in https://www.metanorma.org/author/topics/document-format/collections/#collection-manifest

There is a long outstanding ticket for you, https://github.com/metanorma/annotated-express/issues/97 , on automating the generation of ADOC for attachments, from a single liquid template, because Ronald was not comfortable incorporating shell scripts into Metanorma generation. I think it is pointless to try to prevent shell scripts, because attachments can come from all sorts of sources and have all sorts of content, automatically or manually generated.

But that's not the concern here.

The concern here is to extend attachment functionality to single documents.

I think much of how I've already implement attachments makes sense, and should be carried over:

The one thing that doesn't apply in the case of single documents is that there is no collection manifest, to indicate which documents are attachments, and which are not. There are two possible ways around this:

  1. Convert the single document being compiled into a collection, complete with its own manifest. In fact, I already do this for when we want to compile a single document into an HTML document with one page per section, with the sectionsplit option: https://github.com/metanorma/metanorma/blob/main/lib/metanorma/sectionsplit.rb
  2. Do not set in motion an entire collection, but invoke the referencing and compilation of attachments without a manifest

Right now, attachments use the following machinery:

Manifest:

manifest:
  level: brochure
  title: Brochure/Brochure
  docref:
    - fileref: si-brochure-fr.xml
      identifier: si-brochure-fr
    - fileref: attachment.txt
      identifier: ABC
      attachment: true

Bibliography:

[bibliography]
== Bibliography

* [[[theattachment,repo:(current-metanorma-collection/ABC)]]]

Where repo:(current-metanorma-collection/...) instructs metanorma gem to look up the identifier in the manifest

Reference:

<<theattachment,Attachment 1>>

I think we can work around this by doing:

No Manifest

Same Reference

Bibliography:

[bibliography]
== Bibliography

* [[[theattachment,repo:(attachment/{{file path of attachment relative to current file}})]]]

When metanorma sees repo:(attachment/...) links, it inserts a hyperlink to the attachment, just as metanorma gem does right now, but it presupposes the file name remains the same, and just changes the file suffix. And seeing an attachment in the bibliography during processing of Presentation XML triggers compilation of the attachment, if the file suffix warrants it.

The code to do all of this is there, it's just mostly in metanorma gem, with a little bit in metanorma-standoc; this task would involve migrating some of that code to isodoc. If code is invoked in both isodoc and metanorma, I've tended to move it to metanorma-utils.

abunashir commented 1 year ago

Thanks a lot @opoudjis, this is really a very useful information. I'm trying to wrap my head around the whole collection as there are quite some tasks around those and see if we can somehow come up with something that could address most of those common issues.

Both approaches works great, but just to explore the no manifest attachment options, let's say we want to reference attachments inline (in the doc) and and indicate it needs some pre-processing, how would you do that in the asciidoc? is this enough?

* [[[theattachment,repo:(attachment/{{file path of attachment relative to current file}})]]]

@ronaldtse: Do you have any preference, how would you like to include the attachment in documents?

opoudjis commented 1 year ago
* [[[theattachment,repo:(attachment/{{file path of attachment relative to current file}})]]]

I believe that would be enough. Obviously I would need to do some stuff once I see the "attachment" prefix, but because I've already implemented repo:(current-metanorma-collection/...), at least retrieving that indication is going to be straightforward. If the file is cited with an attachment prefix, I will know that it is an attachment, and I will do the required preprocessing...

... using the handles you will provide :))

abunashir commented 1 year ago

That's great, I feel like this could a good approach to investigate and see if we can make that work easily :)

ronaldtse commented 1 year ago

This issue is URGENT. @abunashir what's going on here?

abunashir commented 1 year ago

Hey @ronaldtse, I actually haven't worked on it since a while now, I was prioritising the works on Corado/Oscal, but If it's still urgent then I can start having look into it :)

abunashir commented 1 year ago

I'm looping back to this issue again, @ronaldtse - So, if I understand correctly, we want to support attachment in any document, and it doesn't have to be a collection. Something similar to adding an image in any asciidoc, but it might needs some preprocessing in this case?

I'm also trying to understand what actually needs to happen internally when there is an attachment macro in the asciidoc, let me know if you have any idea, it might save me some time :)

opoudjis commented 1 year ago

Btw, @abunashir, I have given you the code, but I haven't given you one final bit, which you will need to do for ISO-10303.

I have asked you to insert the attachment generating code in metanorma before the sectionsplit code in document generation, which breaks a document into one document per section.

But there will be documents that need to run both sectionsplit, and attachment generation. In fact, ISO-10303 parts are such a document.

Which means you will need to ensure that both attachment generation and sectionsplit can be run on the same document. (You would run attachment generation first.)

opoudjis commented 8 months ago

There are two tasks being described here, and that has confused matters:

That second step is a distraction, and I am making it out of scope of this ticket.

The solution is going to remain as I sketched above:

I will do the standoc and isodoc bits of this, and I would want to hand over the metanorma gem part of this to someone else. @alexeymorozov ?

I will likely make the zip + manifest task a new ticket.

alexeymorozov commented 8 months ago

We discussed with Nick, and I'll start as soon as the base work in done.

opoudjis commented 8 months ago

If all bibitems in a references section are hidden, the references section needs to be hidden as well. Debug existing code (it didn't realise that a title could be supplied.)

opoudjis commented 8 months ago

This task is done. I am going to create a new task for @alexeymorozov and for @Intelligent2013

opoudjis commented 5 months ago

Will document attachments, and sidestep the zipping work, since it may have been overtaken by capsium.