plazi / arcadia-project

2 stars 1 forks source link

Check validity of deposition files #72

Open tcatapano opened 5 years ago

tcatapano commented 5 years ago

Whatever standard published schema is used for deposition upload xml, it should be:

gsautter commented 5 years ago

You mean the XML "alternative format", right? The primary file is an XHTML. Anyway ...

Pre-upload validation is not a problem, but we need to define the schema first. GG XML intentionally is without a schema so we can easily add new features or mark new details. Keeping it valid against a closed-world schema would require filtering against a positive list of elements and thus withhold details from clients that we actually have annotated. I'd like to propose we have one stripped-down schema-bound XML, and one that has all the details without having to await schema evolution. The former could as well be TaxPub, one we have that finished. Just a thought ... what do you think?

tcatapano commented 5 years ago

@gsautter: Ive validated the html samples provided at https://github.com/plazi/stable-treatment-html against the XHTML 1.0 (Strict) https://www.w3.org/TR/xhtml1/ . Its a small sample, but it does suggest that the current transformation will output compliant XHTML. The only thing that needs to be done is to add the DOCTYPE declaration at the top of the file. Use:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

I still aim to spec the JATS/Taxub to be used as the deposition file format, but adding the DOCTYPE allows us to minimally meet the requirement of depositing a standard compliant file.

gsautter commented 5 years ago

Started adding <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> now ... do we require strict?

tcatapano commented 5 years ago

No, we don't require Strict, but since the samples validated against it I figured we'd go with it since it's slightly preferable to Transitional. Id like to see how a larger sample does against Strict, however.

gsautter commented 5 years ago

Here's the last example, with DOCTYPE declaration and the reference across the top: https://sandbox.zenodo.org/record/361030

tcatapano commented 5 years ago

Using the online validator at https://validator.w3.org/ reveals one minor, but easily fixed, error:

Missing xmlns attribute for element html. The value should be: http://www.w3.org/1999/xhtml
gsautter commented 5 years ago

Thanks, check out https://sandbox.zenodo.org/record/362646

tcatapano commented 5 years ago

That gives an error because a namespace prefix has been declared but not used. Just need to either remove the prefix in the namespace declaration or use the prefix for all the HTML elements in the file. I vote for just removing the prefix from the declaration.

gsautter commented 5 years ago

OK, got it ... this one validates now: https://sandbox.zenodo.org/record/362648