substance / dar

Reproducible Document Archive
82 stars 9 forks source link

DAR Manifest DTD feedback #12

Open Melissa37 opened 6 years ago

Melissa37 commented 6 years ago

From Laura Randall (NCBI/PMC):

Poking around the Github site, it looks like the Dar Article is a strict subset of the JATS, which of course is always fine. I think the Dar Manifest has some of the same issues (well, they’re only issues if this is intended to be part of the “JATS family”) that the MECA DTDs have in that they’re not following the guidelines that we set out in the JATS Compatibility Meta Model (https://groups.niso.org/apps/org/workgroup/jats-sc/download.php/16764/JATS-Compatibility-Model-v0-7.pdf).

A few things jump out:

@name is defined by JATS on : Dar defines it differently (on various elements meaning different things) @version is defined by JATS on : I don’t know if the Dar Runtime comes as an XML format, but it’s using version differently (this is where I’m especially out of my depth) @type seems to mean a lot of different things within the Manifest model depending on what element it’s tied to. This isn’t ideal, even if you’re not trying to keep it within the JATS family. For example, on it’s describing the document-type but on it’s carrying the MIME type in some of the examples in Github (https://github.com/substance/dar/blob/master/examples/classic-manuscript/manifest.xml). It doesn’t look like the @id on is actually an XML type ID attribute because you have an example of one starting with a number (https://github.com/substance/dar): <asset id="234o23489237498234798" mime-type="image/png" name="Picture 1"...

While not strictly an error, it’s not ideal. Generally when people see an @id, they try to use it as an XML:ID and if it’s not defined that way, there are problems (the NLM DTDs had this as a lingering issue in all versions up to 3.0).

michael commented 6 years ago

Thank you for sharing this. We will consider the suggestions. Currently we didn't intend the manifest to conform to JATS, but there's no strong reason why we couldn't do it.

gertvv commented 5 years ago

Are there any plans to address this feedback? In particular the "type" attribute is important. It would also be useful to be able to specify a MIME-type for documents as well as assets (perhaps content-type as attribute name for both assets and documents?).

michael commented 5 years ago

We are open to that. Could you turn this example into how you'd prefer it to be tagged?

<!DOCTYPE manifest PUBLIC "DarManifest 0.1.0" "http://darformat.org/DarManifest-0.1.0.dtd">
<dar>
  <documents>
    <document id="manuscript" name="Reproducible Document Stack" type="article" path="manuscript.xml" />
    <document id="sheet" name="Sheet 1" type="sheet" path="sheet1.csv" />
  </documents>
  <assets>
    <asset id="234o23489237498234798" mime-type="image/png" name="Picture 1" path="234o23489237498234798.png"/>
  </assets>
</dar>
gertvv commented 5 years ago

I think something along these lines would be an improvement, I also tried to address some of the feedback in #9:

<!DOCTYPE manifest PUBLIC "DarManifest 0.1.0" "http://darformat.org/DarManifest-0.1.0.dtd">
<dar>
  <documents>
    <document xml:id="manuscript" type="article" media-type="text/xml" path="manuscript.xml" name="Reproducible Document Stack" />
    <document xml:id="sheet" type="sheet" media-type="text/csv" path="sheet1.csv" name="Sheet 1" />
  </documents>
  <assets>
    <asset xml:id="img_234o23489237498234798" type="image" media-type="image/png" name="Picture 1" path="234o23489237498234798.png"/>
  </assets>
</dar>

See also https://www.w3.org/TR/xml-id/ - the XSD/DTD would need updating to define these attributes as type "xs:ID"/"ID".

It appears JATS defines sub-attributes instead of a single @media-type - see https://jats.nlm.nih.gov/archiving/tag-library/1.1/attribute/mimetype.html - I personally see little value in that, or in conforming the manifest to JATS, but would be fine with that as well.