marklogic / entity-services

Data modeling and code scaffolding for data integration in MarkLogic
https://docs.marklogic.com/guide/entity-services
Apache License 2.0
7 stars 10 forks source link

TDE template derived from Person example fails validation #214

Closed kcoleman-marklogic closed 7 years ago

kcoleman-marklogic commented 7 years ago

If I generate a TDE template from the Person example model and try to insert it using tde:template-insert, the template fails validation. It seems related to having a local ref. My server build is from yesterday (20161115).

Is this user head gap or a problem with the generated template? Is there a customization lesson in here that I need to communicate to users?

The model looks like the following:

{
  "info": {
    "title": "Person",
    "version": "0.0.1",
    "baseUri": "http://example.org/example-person/",
    "description": "A model of a person, to demonstrate several extractions"
  },
  "definitions": {
    "Person": {
      "properties": {
        "id": {
          "datatype": "string"
        },
        "firstName": {
          "datatype": "string"
        },
        "lastName": {
          "datatype": "string"
        },
        "fullName": {
          "datatype": "string"
        },
        "friends": {
          "datatype": "array",
          "items": {
            "$ref": "#/definitions/Person"
          }
        }
      },
      "primaryKey": "id",
      "required": [
        "firstName",
        "lastName",
        "fullName"
      ]
    }
  }
}

The following code generates the template and inserts it, in one fell swoop. It assumes the above model is installed as /es-ex/models/person-0.0.1.json. tde:template-insert performs validation as part of the insertion.

xquery version "1.0-ml";
import module namespace es =
    "http://marklogic.com/entity-services"
    at "/MarkLogic/entity-services/entity-services.xqy";
import module namespace tde = "http://marklogic.com/xdmp/tde" 
  at "/MarkLogic/tde.xqy";

tde:template-insert(
  '/es-ex/templates/person-0.0.1.xml',
  es:extraction-template-generate(
    fn:doc('/es-gs/models/person-0.0.1.json')
  )
)

The result of running the query is the following rudeness:

TDE-INVALIDTEMPLATE: (err:FOER0000) Invalid TDE template: TDE-REPEATEDCOLUMN: A column is declared more than once in the same template row: column "id" under view "Person_friends" and schema "Person"

Here is the template that gets generated:

<?xml version="1.0" encoding="UTF-8"?>
<template xmlns="http://marklogic.com/xdmp/tde">
  <description>
Extraction Template Generated from Entity Type Document
graph uri: http://example.org/example-person/Person-0.0.1</description>
  <context>//es:instance</context>
  <vars>
    <var>
      <name>RDF</name>
      <val>"http://www.w3.org/1999/02/22-rdf-syntax-ns#"</val>
    </var>
    <var>
      <name>RDF_TYPE</name>
      <val>sem:iri(concat($RDF, "type"))</val>
    </var>
  </vars>
  <path-namespaces>
    <path-namespace>
      <prefix>es</prefix>
      <namespace-uri>http://marklogic.com/entity-services</namespace-uri>
    </path-namespace>
  </path-namespaces>
  <templates>
    <template xmlns:tde="http://marklogic.com/xdmp/tde">
      <context>./Person</context>
      <vars>
        <var>
          <name>subject-iri</name>
          <val>sem:iri(concat("http://example.org/example-person/Person-0.0.1/Person/", fn:encode-for-uri(./id)))</val>
        </var>
      </vars>
      <triples>
        <triple>
          <subject>
            <val>$subject-iri</val>
          </subject>
          <predicate>
            <val>$RDF_TYPE</val>
          </predicate>
          <object>
            <val>sem:iri("http://example.org/example-person/Person-0.0.1/Person")</val>
          </object>
        </triple>
        <triple>
          <subject>
            <val>$subject-iri</val>
          </subject>
          <predicate>
            <val>sem:iri("http://www.w3.org/2000/01/rdf-schema#isDefinedBy")</val>
          </predicate>
          <object>
            <val>fn:base-uri(.)</val>
          </object>
        </triple>
      </triples>
    </template>
    <template xmlns:tde="http://marklogic.com/xdmp/tde">
      <context>./Person</context>
      <rows>
        <row>
          <schema-name>Person</schema-name>
          <view-name>Person</view-name>
          <columns>
            <column>
              <name>id</name>
              <scalar-type>string</scalar-type>
              <val>id</val>
            </column>
            <column>
              <name>firstName</name>
              <scalar-type>string</scalar-type>
              <val>firstName</val>
            </column>
            <column>
              <name>lastName</name>
              <scalar-type>string</scalar-type>
              <val>lastName</val>
            </column>
            <column>
              <name>fullName</name>
              <scalar-type>string</scalar-type>
              <val>fullName</val>
            </column>
          </columns>
        </row>
      </rows>
      <templates>
        <template>
          <context>./friends</context>
          <rows>
            <row>
              <schema-name>Person</schema-name>
              <view-name>Person_friends</view-name>
              <columns>
                <column>
                  <!--This column joins to property id of Person-->
                  <name>id</name>
                  <scalar-type>string</scalar-type>
                  <val>../id</val>
                </column>
                <column>
                  <!--This column joins to primary key of Person-->
                  <name>id</name>
                  <scalar-type>string</scalar-type>
                  <val>Person</val>
                </column>
              </columns>
            </row>
          </rows>
        </template>
      </templates>
    </template>
  </templates>
</template>
grechaw commented 7 years ago

This will be a regression from some work I did a couple weeks ago. Choice of column names clearly doesn't work the same when it's a self-join!

Agreed that this is an issue -- I don't think the solution is very obvious, but we can hash it out.

grechaw commented 7 years ago

You are getting good with entity services code machinations @kcoleman-marklogic .

I need a suggestion for column naming in this circumstance. The first occurence of 'id' is the ID of the 'friend' on the left-hand side of a relationship. The second is the ID of that person's friend.

grechaw commented 7 years ago

The mitigation for this situation (if we don't change entity services codegen) is to rename one or both of those columns in the derived artifact. Since we can predict when this will happen, however, it seems we should provide a template that doesn't break OOTB.

kcoleman-marklogic commented 7 years ago

I'm not at all sure I am the best person to suggest a column name, given my deep ignorance of all things relational. Would something like "origin_id", "friend_of", or "friends_with" work?

grechaw commented 7 years ago

A proposal as I'm reviewing and editing spec. In the circumstance where the columns have the same, name, append _l to the left one and _r to the right one.

 <column>
                  <!--This column joins to property id of Person-->
                  <name>id_l</name>
                  <scalar-type>string</scalar-type>
                  <val>../id</val>
                </column>
                <column>
                  <!--This column joins to primary key of Person-->
                  <name>id_r</name>
                  <scalar-type>string</scalar-type>
                  <val>Person</val>
                </column>

Usage in SQL would be

SELECT l.name, r.name FROM Person l, Person_friends m, Person r where l.id = m.id_l and m.id_r = r.id

kcoleman-marklogic commented 7 years ago

No objections but just wondering: These are left and right of what? Each other in the mythical table? (Sorry, I'm sure my relational ignorance is showing.)

kcoleman-marklogic commented 7 years ago

Has anything changed recently related to this? I'm working off last night's trunk server build. I'm getting an entirely different validation failure now. tde.validate reports:

{
"valid": false, 
"error": "XDMP-NOTSIMPLE", 
"message": "XDMP-NOTSIMPLE: Node does not have simple content: fn:doc('/space/es/codegen/person-templ-0.0.1.xml')/tde:template/tde:templates/tde:template[2]/tde:templates/tde:template/tde:rows/tde:row/tde:columns/tde:column[2]/tde:scalar-type"
}

That's because the template for friends has an empty scalar type. That piece of the template looks like the following now. I marked the bad spot with a smart ass comment.

    <template>
      <context>./friends</context>
      <rows>
        <row>
          <schema-name>Person</schema-name>
          <view-name>Person_friends</view-name>
          <columns>
        <column>
          <!--This column joins to property id of Person-->
          <name>id</name>
          <scalar-type>string</scalar-type>
          <val>../id</val>
        </column>
        <column>
          <!--This column holds array values from property id of Person-->
          <name>friends</name>
          <scalar-type/>                        <!-- OMG, NO BUENO -->
          <val>.</val>
          <nullable>true</nullable>
        </column>
          </columns>
        </row>
      </rows>
    </template>
grechaw commented 7 years ago

Interesting. I'll be taking these bugs up when EA-4 goes out. EA-4 is still tied to our develop branch, so I didn't want to touch it much. But I do intend to fix!

grechaw commented 7 years ago

Added a proper validity test to unit tests in addition to tde:get-view. This will catch the invalid templates, and now I can fix this as a bug.

grechaw commented 7 years ago

Looks like you're already getting this ready @bsrikan

bsrikan commented 7 years ago

Added test and is available in latest QA PR #251

bsrikan commented 7 years ago

QA has verified this bug and test running ok in regression. Passing over to @kcoleman-marklogic to verify and ship

kcoleman-marklogic commented 7 years ago

I'm still seeing exactly the same behavior as before. I get a template that won't validate because <scalar-type/> is empty in the friends sub-template.

Is there something about this fix that would require a complete reinstall of the server before it kicks in?

kcoleman-marklogic commented 7 years ago

Broke down and did a completely clean install. Makes no difference. I'm returning this to you for contemplation, @grechaw.

kcoleman-marklogic commented 7 years ago

I'm reassigning this back to myself for now. I walked through the Getting Started example from scratch this afternoon, and the template was OK, so I'm investigating how that was different than what I did yesterday. It's likely to be user-head gap, but I want to understand it in case there's some other case in which the scalar type comes out empty.

kcoleman-marklogic commented 7 years ago

Well, that was exciting to figure out.

It turns out that in the case where the template was (still) invalid, I had left the $ off of my ref ("ref" instead of "$ref"). Since I was using a JSON descriptor rather than XML, validation was a no-op and didn't pick up on this. It was just coincidence that this mistake manifested as exactly the same bad template behavior as the original bug.

All is well and this bug is fixed, so I will tag it as ship. Sorry for the runaround.

grechaw commented 7 years ago

Victory!

ErikHoeven commented 7 years ago

I have got a similar problem with other error message:

[1.0-ml] SVC-FILOPN: File open error: open '/MarkLogic/tde.xqy': No such file or directory Stack Trace

At line 2 column 0: In xdmp:eval("xquery version "1.0-ml"; import module namespace ...", (), 18182129315596444747...)

  1. xquery version "1.0-ml";
  2. import module namespace tde = "http://marklogic.com/xdmp/tde"
  3. at "/MarkLogic/tde.xqy";

I think i mis something but i cant see what

grechaw commented 7 years ago

Hi @ErikHoeven I can't reproduce your isssue here; TDE imports fine for me on windows, 9.0-1. It would be very surprising were this to happen.

Check for the existence of the file -- maybe your install failed. An OS-level permissions error might also cause something like this. It seems like something in the OS/install level rather than the package though, and TDE was not exposed to the same packaging bug that entity services was.

grechaw commented 7 years ago

I was going to say, you van verify whether or not the tde.xqy file is present at, on linux or mac, /opt/MarkLogic/Modules/MarkLogic/tde.xqy Windows will be C:\Program Files\MarkLogic\Modules\MarkLogic\tde.xqy

ErikHoeven commented 7 years ago

Thanks for the fast reply grechaw. I use Linux and get the folowing files

[erik@marklogic-vm /opt/MarkLogic/Modules/MarkLogic $ ls -l
total 1756
drwxr-xr-x  3 root root   4096 Apr 14 19:40 Admin
-rw-r--r--  1 root root 661231 Apr 11 09:08 admin.xqy
drwxr-xr-x  3 root root   4096 Apr 14 19:40 alert
-rw-r--r--  1 root root  98216 Apr 11 09:08 alert.xqy
drwxr-xr-x 10 root root   4096 Apr 14 19:40 appservices
-rw-r--r--  1 root root   2488 Apr 11 09:08 aws.xqy
drwxr-xr-x  2 root root   4096 Apr 14 19:40 cdict
drwxr-xr-x  3 root root   4096 Apr 14 19:40 conversion
-rw-r--r--  1 root root   6675 Apr 11 09:08 cookies.xqy
drwxr-xr-x  4 root root   4096 Apr 14 19:40 cpf
-rw-r--r--  1 root root   5908 Apr 11 09:08 custom-dictionary.xqy
-rwxr-xr-x  1 root root  12786 Apr 11 09:08 dls-upgrade.xqy
-rw-r--r--  1 root root  63041 Apr 11 09:08 dls.xqy
-rw-r--r--  1 root root  37583 Apr 11 09:08 ec2-2009-11-30.xqy
drwxr-xr-x  3 root root   4096 Apr 14 19:40 entity
-rw-r--r--  1 root root   4308 Apr 11 09:08 entity.xqy
drwxr-xr-x  3 root root   4096 Apr 14 19:40 filter
drwxr-xr-x  5 root root   4096 Apr 14 19:40 flexrep
-rw-r--r--  1 root root 175472 Apr 11 09:08 flexrep.xqy
drwxr-xr-x  2 root root   4096 Apr 14 19:40 functx
drwxr-xr-x  2 root root   4096 Apr 14 19:40 geospatial
-rw-r--r--  1 root root   3475 Apr 11 09:08 hadoop.sjs
-rw-r--r--  1 root root  12541 Apr 11 09:08 hadoop.xqy
drwxr-xr-x  2 root root   4096 Apr 14 19:40 jsearch
-rw-r--r--  1 root root   4332 Apr 11 09:08 jsearch.sjs
drwxr-xr-x  2 root root   4096 Apr 14 19:40 json
drwxr-xr-x  8 root root   4096 Apr 14 19:40 manage
drwxr-xr-x  3 root root   4096 Apr 14 19:40 mustache
drwxr-xr-x  2 root root   4096 Apr 14 19:40 openxml
-rw-r--r--  1 root root  48725 Apr 11 09:08 pki.xqy
drwxr-xr-x  2 root root   4096 Apr 14 19:40 plugin
drwxr-xr-x  5 root root   4096 Apr 14 19:40 rest-api
drwxr-xr-x  8 root root   4096 Apr 14 19:40 samples
-rw-r--r--  1 root root 202299 Apr 11 09:08 security.xqy
drwxr-xr-x  2 root root   4096 Apr 14 19:40 semantics
-rw-r--r--  1 root root  17400 Apr 11 09:08 semantics.xqy
-rw-r--r--  1 root root   7491 Apr 11 09:08 spell.xqy
-rw-r--r--  1 root root  16592 Apr 11 09:08 temporal.xqy
-rw-r--r--  1 root root  40112 Apr 11 09:08 thesaurus.xqy
-rw-r--r--  1 root root 159672 Apr 11 09:08 tieredstorage.xqy
-rw-r--r--  1 root root  28401 Apr 11 09:08 triggers.xqy
-rw-r--r--  1 root root   2536 Apr 11 09:08 utilities.xqy
-rw-r--r--  1 root root  37737 Apr 11 09:08 views.xqy
drwxr-xr-x  2 root root   4096 Apr 14 19:40 welcome
-rw-r--r--  1 root root   4541 Apr 11 09:08 xa.xqy
drwxr-xr-x  3 root root   4096 Apr 14 19:40 xinclude
drwxr-xr-x  3 root root   4096 Apr 14 19:40 xslt
](url)

But i dont find it in there. Can i find it on git hub. Then i can add them to the directory

grechaw commented 7 years ago

Hi Erik, this does not look like MarkLogic 9. TDE is a MarkLogic 9 feature, just released last week. You'll want to go and get a new download at developer.marklogic.com.