libero / publisher

The starting point for raising issues for Libero Publisher
MIT License
16 stars 4 forks source link

Investigate file extension workflow requirements #202

Open Melissa37 opened 5 years ago

Melissa37 commented 5 years ago

Problem / Motivation

Libero Publisher requires xreflinks to figure files within the XML coming from a publisher to contain .jpeg file extensions in order to load the content to the site

Proposed solution

Cannot be decided until more investigation happens

Tasks

Production to investigate

Clarification needed and assumptions

Technical notes

User interface / Wireframes

@BlueReZZ @thewilkybarkid @GiancarloFusiello @FAtherden-eLife

Melissa37 commented 5 years ago

@Maelplaine I cannot assign this ticket to anyone - it should be assigned to Fred and Me.

Linked to #200

Thanks!

Melissa37 commented 5 years ago

PMC requires .tiff format files "Uncompressed high-resolution TIFF or EPS files are required for all images."

Examples of crosslink tagging in the system do not include a file type suffix:

eg: <graphic id="gra2" xlink:href="pnas.1207965110fig01"/>

HighWire: HWX Specification (tiff format) Color images – 300 dpi, Gray scale images – 600 dpi, Line art – 1200 dpi

thewilkybarkid commented 5 years ago

Examples of crosslink tagging in the system do not include a file type suffix:

eg: <graphic id="gra2" xlink:href="pnas.1207965110fig01"/>

Could just be that the filename doesn't have the extension? Reading https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/tags.html#el-graphic it says "The name of the file"; on other elements it's "Include the full filename, including file extension, in the @xlink:href value".

Except:

<related-article> [...] When specifying a DOI, tag the DOI value in @xlink:href and specify @ext-link-type="doi".

So that's where that came from. Which is crazy, as it's violating the XLink spec?

Melissa37 commented 5 years ago

So that's where that came from. Which is crazy, as it's violating the XLink spec?

I wonder whether it's because a publisher can keep one XML source of truth if they don't add the file extension to figures. We have to send .tiffs to everyone, or eps...but internal systems then convert them to .jpegs to display on the web.

I wonder why we cannot allow non-file extension figure file references in Libero when every other system we've worked with has not had a problem with this? Also, publishers will need to output as .tiff probably from production so you won't get input with .jpeg extensions as that's not what Libero is going to get...

You cannot download XML from the PMC site directly, but it would be interesting if someone has access to the corpus via the API to see whether PMC add file extensions to the XML they "process".

I would suggest if you add .jpeg to the XML, this needs to be throwaway XML only for the purpose of the site, which is not then delivered anywhere else or used as the archive version - the "source of truth" needs to be exactly what the publisher sent. This is because the Libero display/internal needs do not match all requirements in a full workflow

fred-atherden commented 5 years ago

Hindawi XML might clarify this a little: JCNC_6826984_Final_1

It looks like they include various formats of the same image (eps, jpg, svg) in the folder, and refer to it without filename extension (e.g. xlink:href="6826984.fig.002), so that actual file being used is deliberately ambiguous, and the extension is probably picked out depending on how the XML is transformed - presumably in the Hindawi case the eps is used for the PDF, the jpg for online, and SVG is included for archival purposes.