usgin / modelmanager

USGIN Content Model Management App
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

URIs on XSDs don't match contentmodels.json #6

Closed rclark closed 10 years ago

rclark commented 10 years ago

@smrAzGS @ccaudill @jalisdairi @asonnenschein

We have unresolved content model URI discrepancies that we need to resolve. Here's the story:

We started out by defining what URIs should look like. Importantly, we thought that URIs should be "host-agnostic", meaning that the important part of the identifier was what comes after the http://my-server-name.something/. This would allow more than one server to exist on the internet that could be capable of "resolving" URIs.

We set up http://resources.usgin.org to resolve the URIs that we were making up. We were pretty happy. We made a bunch of content models, made XSDs for them, and gave them URIs like this one:

http://stategeothermaldata.org/uri-gin/aasg/xmlschema/activefault/1.1

(wait a minute... that won't even resolve anywhere!)

Then we realized that we needed a dedicated system for managing our content models as we spun out new versions. We also wanted a place that people could come to and understand what models we had to offer, so they could find what's appropriate for their data. So, http://schemas.usgin.org/ happened.

One of the things that site intended to do was automatically generate and maintain redirection rules for any schemas that you set up in the system there. So that site includes its own URI redirection engine. When you create a new model, it makes the URI redirection rules for you.

Now remember that host-agnostic part of things? Well... practically speaking that's a bust. schemas.usgin.org can only resolve URIs that start with schemas.usgin.org and resources.usgin.org can only resolve URIs that state with resources.usgin.org. That's because if you resolve a URI with another host name, your request never even gets to the server. That's how the internet works.

So, schemas.usgin.org has to make URIs that start with schemas.usgin.org. Also, in conversations with Steve, we determined that

So, http://schemas.usgin.org makes URIs like this:

http://schemas.usgin.org/uri-gin/ngds/dataschema/activefault/1.1

The problem is now apparent if you look at the JSON objects that http://schemas.usgin.org spits out. They list the schemas URIs as the URI for the model, while the XSDs all say something else.

What this means is that a system that reads the XSD will think one URI is correct, and a system that reads the JSON will think another is correct. Currently, ckanext-ngds operates against the JSON object, and so it creates metadata records for content-model-aware file uploads that reference schemas.usgin.org.

I think that the root of the problem is kind of philosophical: There must be one machine-actionable, canonical representation of each content model. Any other representations of the model must be derivatives of that canonical model. We should make sure that we're clear what the real representation is.

Also, we just need to decide: should schemas.usgin.org manage URIs for us, or should we ditch that and force ourselves to manage them at resources.usgin.org?

ccaudill commented 10 years ago

So I refer to these just as "namespaces", but I suppose that they are URIs for the schemas. We have no choice but to keep those that are currently indicated for the 30+ schemas, as none of the 400+ services would validate if those were changed now.

I agree that the namespace and prefix that we were using is too specific to a segment of the project's data. It would be nice if schemas.usgin.org could manage the namespace/prefix creation for us, but that would really mean that all the namespaces in the current services should be changed; hence versioned and redeployed, so that's not practical.

rclark commented 10 years ago

I know what you mean, but the multiple identifiers is nothing if not super confusing.

From that perspective you could say we have one URI (generated by schemas.usgin.org) and one Namespace (just something we put in the XSD). That's valid, but its confusing.

ccaudill commented 10 years ago

Oh, and have them different? I understand, that would work. If be real confusing.

rclark commented 10 years ago

That's what's happening right now.

smrgeoinfo commented 10 years ago

We probably need to discuss f2f, but I think of it this way. The Namespace is an abstract concept for the names defined in the content model. I think the Namespace could be viewed as a representation of the content model, thus have the same URI. The XML schema implements the content model. Elements in the XML schema are scoped to a namespace. The namespace URI in an XML instance can be considered to identify the content model. The xsi:schemaLocation gives a URL that locates an XML schema that implements the content model using that namespace.

steve

rclark commented 10 years ago

After a conversation with @smrAzGS, I think we decided that when new versions of the content models are created, we should build XSD docs that use the schemas.usgin.org URIs as the namespace URI.

Still up in the air whether or not we should provide redirection rules that would "resolve" the existing stategeothermaldata.org namespace URIs.