marklogic / entity-services

Data modeling and code scaffolding for data integration in MarkLogic
https://docs.marklogic.com/guide/entity-services
Apache License 2.0
7 stars 10 forks source link

Error generating TDE #292

Open matt-turner opened 7 years ago

matt-turner commented 7 years ago

Hi - I'm creating a demo using ES on standards documents with a schema called STS. I've got the description up and working, can generate envelop documents, but when I go to validate and install the TDE I get this error:

[javascript] TDE-INVALIDTEMPLATE: (err:FOER0000) tde.templateInsert( -- Invalid TDE template: XDMP-NOTSIMPLE: Node does not have simple content: fn:doc("/source/models/sts-templ.xml")/tde:template/tde:templates/tde:template[2]/tde:templates/tde:template/tde:rows/tde:row/tde:columns/tde:column[2]/tde:scalar-type

I'm attaching my model and the generated template (as a .zip)

There is one array in the definition that defined entity for normative references attached to each standard.

The TDE seems to be stopping on line 132 where there is an empty scalar type. Not sure what should be there to accomplish the definition of the normative refs.

I created all this with EA4 (9.020170113) but just ran it all again with 9.0-20170418. I'm running Mac OS.

Thanks in advance,

Matt

sts-demo.zip

kcoleman-marklogic commented 7 years ago

Charles is out of town, so I'll take a stab. This is not a valid array item type definition.

"norm-refs": {
  "datatype": "array", 
  "items": {"std-ref": {"datatype": "string"}}
}

It should be like this:

"items": {"datatype": "string"}

If you want it to be a reference to some external entity type, then you could set the datatype value to "iri". If std-ref was an entity type defined within this model, then you could reference it like this:

"items": { "$ref": "#/definitions/std-ref" }

Basically, the value of "items" has to have one of the forms you see here: http://docs-ea.marklogic.com/guide/entity-services/models#id_13962.

I am curious:

Did you not also run into problems with dashes in your property names? With my build, which admittedly is from 4/15, tde:validate rejected all the prop names containing dashes. I had to change them all to underscores to get at your scalar type problem.

jmakeig commented 7 years ago

Did you not also run into problems with dashes in your property names?

Yes, he did. Where are the rules for column names documented? Entity Services should generate templates that aren’t invalid by default.

matt-turner commented 7 years ago

Oops - I forgot to mention that part as well. Yes I had to change all the names to conform to SQL column heading standards.

Not sure the best way to resolve that ... but that is also an issue.

I don't think I understand the proposed solution ... so let me just go with what I'm trying to do:

Each standard has a set of references called normative references.

I want to create, in the envelop document, the following pattern:

<norm-ref>
  <std-ref>XXX</std-ref>
  <std-ref>YYY</std-ref>
</norm-ref>

I also want these to define a separate 'table' in the SQL view.

So that's why I tried to use the array construct - but obviously my version isn't correct - with the current definition I was able to create envelope documents but now that I look at them they aren't correctly in an array.

<Standard>
  <urn>iso:std:iso:1161:ed-5:v1:en</urn>
  <doc-number>1161</doc-number>
  <title>Series 1 freight containers — Corner and intermediate fittings — Specifications</title>
  <doc-type>IS</doc-type>
  <originator>ISO</originator>
  <secretariat>AFNOR</secretariat>
  <pub-date>2016-07-15</pub-date>
  <release-date>2016-07-15</release-date>
  <norm-refs datatype="array">ISO 1496-1</norm-refs>
  <norm-refs datatype="array">ISO 148-1</norm-refs>
</Standard>

So how should I properly use the array type?

Thanks,

Matt

kcoleman-marklogic commented 7 years ago

From what you've said, norm-refs represents another entity type. You should model it accordingly. Entity properties can only have 3 possible types: Scalar, array, and reference to another entity type.

kcoleman-marklogic commented 7 years ago

Where are the rules for column names documented? Entity Services should generate templates that aren’t invalid by default.

Since the ES spec doesn't mention this restriction, I was unaware of it until Matt reported his problem. Ergo, it's not documented anywhere, atm. :) However, rest assured that I am going to add it.

@bsrikan recently made me aware of a similar restriction on the model title, imposed by TDE.

I believe the TDE documentation covers this in that it states blah blah must be a valid SQL view name, but that's much too far a distance for a user to bridge. The ES documentation needs to cover it.

jmakeig commented 7 years ago

Entity properties can only have 3 possible types: Scalar, array, and reference to another entity type.

…today. We’re working on how to model “weak” entities, i.e. that aren’t free-standing, but represent some hierarchy under the main entity.

jmakeig commented 7 years ago

The ES documentation needs to cover it.

Shouldn’t we just transform the names in the TDE generation so that they’re valid, such as converting - to _?

kcoleman-marklogic commented 7 years ago

Recall that Charles is out of town. For all practical purposes, that makes this WAI, and it should be documented. Perhaps, an RFE for the next release.

matt-turner commented 7 years ago

So can you guys give me pointers on how to create the proper model? Maybe give me the right way to do this? Thanks, Matt

kcoleman-marklogic commented 7 years ago

I don't think you'll get exactly the structure you want using ES (unless, I suppose, you do something custom in your instance converter code), but a combination of arrays and perhaps a second entity type should get close.

Justin can probably speak more wisely to this than I can, but isn't starting with the entity structure and trying to force the model to conform a bit of the tail wagging the dog?

In any case, I don't at all mean to blow you off, but I'm under a serious deadline crunch right now, so I don't have time to devote to this. Esp. since my expertise is not much beyond your own (if that). Perhaps Justin has time to chime in with better suggestions.

bsrikan commented 7 years ago

Shouldn’t we just transform the names in the TDE generation so that they’re valid, such as converting - to _?

It would be a good idea to rather add this rule to model-validate() so folks dont hit this issue in later stages.

@matt-turner You can try editing norm-refs as below:

"norm-refs": {
"datatype": "array", 
"items": {
"$ref": "#/definitions/Std_ref"
}
}

and adding another entity type as so:

"Std_ref": {
"properties": {
"std_ref": {
"datatype": "string"
}
}
}

With this you have to edit the generated conversion-module to add a "/" as shown in Post Edit: OOTB: =>es:optional('norm-refs', es:extract-array($source-node/norm-refs, standard:extract-instance-Std_ref#1)) Post Edit: ` =>es:optional('norm-refs', es:extract-array($source-node/norm-refs/, standard:extract-instance-Std_ref#1))`

matt-turner commented 7 years ago

Thanks - I'll give all that a try and report back (in a day or so).

matt-turner commented 7 years ago

OK - here is an update:

  1. I am able to generate the templates for TDE with this new configuration
  2. But I don't have the conversion working properly - my std_ref data elements in the envelope (and everywhere else) are not pulling the data values

I'm attaching a full set of files for this:

  1. Current model
  2. Current converter
  3. Sample content that includes the right norm_ref -> std_ref structures

If someone can take a look and help me with the right way to put the path information into the converter that would be a huge help!

Thanks,

Matt

sts_samples2.zip

jmakeig commented 7 years ago

@bsrikan, do you have some bandwidth to help @matt-turner out? Much appreciated.

bsrikan commented 7 years ago

Sure yes. Looking into it.

bsrikan commented 7 years ago

So I made couple of changes to sts-0.0.2-conv.xqy to get them working:

  1. Replaced $source-node/ref/std/std-ref with $source-node//std-ref
  2. extract-instance-Standard() was missing $instance before => map:with('urn', .. etc

sts-0.0.2-conv.xqy.zip

matt-turner commented 7 years ago

OK - that got me the values correctly extracted for the envelope pattern ... and the TDE validated and installed.

However I don't think that is the right data pattern. What I am ultimately looking to do is be able to query the norm_ref data in relation to the standards. There are multiple norm-refs in each standard and we want to show queries that show all of the norm-refs in a standard and also do things like show all of the standards a given norm_ref is associated with. I want to do this using SQL.

This is the current data it is generating (and yes I know that the envelop isn't the same as the triples used for SQL but I have to assume they share the same pattern):

<norm_refs datatype="array">
<Std_ref>
<std_ref>ISO 830:1981
</std_ref>
</Std_ref>
</norm_refs>
<norm_refs datatype="array">
<Std_ref>
<std_ref>ISO 1161
</std_ref>
</Std_ref>

It has an array - but only one ref under each norm_ref.

I was looking to see this:

<norm_ref>
<std_ref>ISO 830:1981
<std_ref>ISO 1161
</std_ref>
</norm_ref>

And then to also see when I use SQL that I can address a norm_ref table as well as a standard table and do joins between them.

This doesn't have to all get solved today - in fact I am signing off now but I wanted to put all this out there.

I will have some more time on Monday to look at it and then will really need to get all this solved Thursday and Friday of next week.

Thanks in advance,

Matt

bsrikan commented 7 years ago

Okay. The TDE would have passed validation but it doesnt surface the norm_refs table. This is because of - in the view name. Please edit following:

Now you can query for Standard and norm_refs table, see your data and hopefully be able to do joins:

select * from Standard;
select * from Standard_norm_refs
grechaw commented 7 years ago

Nice thread! Thank you for persistent experimentation @matt-turner and for your timely responses @bsrikan . I'm digging out of an email hole and will loop back to this issue later.

matt-turner commented 7 years ago

Another update: by defining a separate Std_ref entity and then referencing that with norm_refs, I am able to extract data from the norm_refs to create envelope data. I also added a title element to those norm_refs. However that data remains in line, not nested under a single norm_ref element in the envelope.

<norm_refs datatype="array">
<Std_ref>
<std_ref>ISO 830:1999
</std_ref>
<ref_title>Freight containers — Vocabulary
</ref_title>
</Std_ref>
</norm_refs>
<norm_refs datatype="array">
<Std_ref>
<std_ref>ISO 1161:1984
</std_ref>
<ref_title>Series 1 freight containers — Corner fittings — Specification
</ref_title>
</Std_ref>
</norm_refs>

I can also now query a standard_norm_ref table in SQL, but all the values are null.

select * from standard_norm_refs =>

iso:std:iso:668:ed-6:v1:en  null    null
iso:std:iso:18185:-3:ed-2:v1:en null    null
iso:std:iso:1161:ed-5:v1:en null    null
iso:std:iso:1496:-1:ed-6:v1:en  null    null

@grechaw and I will be going through this shortly and will update this with new information after.

Thanks,

Matt

matt-turner commented 7 years ago

OK we have a fix (sort of):

I have edited the generated TDE template to understand the data location of the norm_ref data.

These lines in the generated TDE (attached) were changed:

<context>./norm_refs/Std_ref</context>
<!-- changed from /norm_refs -->
      <rows>
        <row>
          <schema-name>Standard</schema-name>
          <view-name>Standard_norm_refs</view-name>
          <view-layout>sparse</view-layout>
          <columns>
        <column>
          <!--This column joins to property urn of Standard-->
          <name>urn</name>
          <scalar-type>string</scalar-type>
          <val>../../urn</val>
<!-- changed from ../urn because of the new deeper context -->
        </column>

This does now generate the proper standards_norm_ref data ... and I can query it as expected.

However not sure if this is how it should work -> either the model is right OR the way the TDE template is right. But I don't think I should have to shift context from how I defined the model to what TDE is looking for to get it to work correctly.

I will leave it up to this list to determine if this is a bug and also to close this out once you make that call.

Finally, the approach to have each element in the array represented as siblings (and not nested) doesn't make sense to me in the XML world - but once everything is working it really doesn't matter ... as long as I only use the envelope data for TDE. I'm not familiar with the way it would work with JSON so this is just an opinion.

Thanks,

Matt

grechaw commented 7 years ago

We'll take some action, or not, on this in the next WG meeting. @matt-turner I'll put you on the optional invite in case you have an interest.