ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
98 stars 33 forks source link

additional namespace(s) #289

Open atn38 opened 4 years ago

atn38 commented 4 years ago

Hello EML team,

I'm trying to declare an additional namespace from "http://ns.dataone.org/service/types/v1", to ultimately have this snippet inserted into additionalMetadata:

<additionalMetadata>
    <metadata>
      <d1v1:replicationPolicy xmlns:d1v1="http://ns.dataone.org/service/types/v1" numberReplicas="1"
        replicationAllowed="true">
        <preferredMemberNode>urn:node:ADC</preferredMemberNode>
      </d1v1:replicationPolicy>
    </metadata>
  </additionalMetadata>

These are the approaches I could think of:

I've tried combinations of the above; all fail with the error message ns_lookup(parent$doc, parent$node, parts[[1]]) : No namespace with prefixd1v1found. The error happens when write_eml or eml_validate is called; I can construct an emld list object with no issues. Insight would be appreciated! I can reproduce and attach the emld object if needed.

cboettig commented 4 years ago

Apologies for the slow reply and thanks for the bug report. I'm still on the road so haven't had a chance to go splunking in this, but in general here's the debug strategy I recommend:

Obviously things might fail at any of the above steps but knowing where it fails and where it passes will help us debug. Again apologies can't be of more help right yet!

atn38 commented 4 years ago

@cboettig sorry for the late reply. Below is a minimal EML example with the additionalMetadata snippet inserted manually in text editor. This is valid according to the online EML validator. EML::read_eml reads this in just fine, but EML::eml_validate on both the file and the parsed list structure from read_eml fails and returns this error:

Error in eml_locate_schema(doc) : 
  No schema found for namespace:  http://ns.dataone.org/service/types/v1
Error in UseMethod("read_xml") : 
  no applicable method for 'read_xml' applied to an object of class "NULL"

Which I think is essentially the same problem as before (EML not recognizing the extra namespace).

Minimal EML (idk how to attach a XML file in github comment)

<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2" packageId="id" system="system" xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0/ eml.xsd">
  <dataset>
    <title>A Mimimal Valid EML Dataset</title>
    <creator>
      <individualName>
        <givenName>Blizzard</givenName>
        <surName>Frosty</surName>
      </individualName>
    </creator>
    <contact>
      <individualName>
        <givenName>Blizzard</givenName>
        <surName>Frosty</surName>
      </individualName>
    </contact>
  </dataset>
  <additionalMetadata>
        <metadata>
      <d1v1:replicationPolicy xmlns:d1v1="http://ns.dataone.org/service/types/v1" numberReplicas="1"
        replicationAllowed="true">
        <preferredMemberNode>urn:node:ADC</preferredMemberNode>
      </d1v1:replicationPolicy>
    </metadata>
  </additionalMetadata>
</eml:eml>
cboettig commented 4 years ago

@atn38 Thanks for the follow-up! Great, yes, I suspect this problem is due to the schemaLocation, as we've been discussing over in #292. I need to double-check, but I believe the validator in the R package is assuming that schemaLocation points to it's local copy, where obviously it ought to be able to use a URL as well. We should have a fix for this soon!

atn38 commented 4 years ago

Thanks for the pointer @cboettig, good to understand how EML works under the hood more. Great timing on these issues coming up together!

mbjones commented 4 years ago

@cboettig I think the validator should completely ignore schemaLocation, and only use a cached copy of the officially released schema. Otherwise, various errors that reported can be due to improperly maintained or edited schema files, rather than document errors, and this leads to confusion for end users. On the Java side, we do this with an xml catalog entry, which is supported by pretty much all mature xml parsers.

atn38 commented 1 year ago

hey EML team, this issue has come up for me again. Any updates?

jeanetteclark commented 1 year ago

Hi An,

I'm not actually able to replicate this with your example - could you send your session info along?

Thanks!