Adding metadata features to views

marcverhagen commented 7 years ago

We have an open issue in the LIF pages where we suggest we might add a dependsOn feature to a view's metadata:

"id": "v2",
"metadata": {
    "contains": {
         "http://vocab.lappsgrid.org/Token#pos": {
         "producer": "edu.brandeis.cs.lappsgrid.opennlp.POSTagger:2.0.2",
         "type": "tagger:opennlp" } },
    "dependsOn": ["v1"] }

This should probably be a optional and list-valued feature.

We also mention adding a timestamp feature.

keighrim commented 7 years ago

My two cent: I totally agree on the need for having dependOn field. My one concern is whether we have to provide a guided (or forced) syntactic notation for back-referencing a previous view in a later view. Currently, in the examples on the lapps wiki, we are using colons.

{ "@type": "Dependency",
          "label": "ROOT",
          "id": "dep0",
          "features":    {
            "governor": null,
            "dependent": "v1:tok1" }},

But I don't think we have explicitly made colons to be an exclusive (or recommended) way to write such relations, and I'm not sure we can force such a notation in JSON(-LD) world.

marcverhagen commented 7 years ago

We need to think about this a bit more. I had imagined that dependsOn is a relation between views, so view X could say information in there actually depended on View Y. The individual annotation itself can be completely independent of the other annotation (it only refers to character offsets).

Having this be a relation between views is what I had in mind when putting up the LIF example above. We could go a little bit fancier (as suggested by Keith):

"dependsOn" : [
    { "view": "v1", "type": "Token" },
    { "view": "v2", "type": "NamedEntity"}
]

I think we also need to come up with a straight story on why having dependsOn would be a good thing to have.

On the other hand, we could also view dependsOn as a relation between annotations, as in annotations of type Sentence depend on annotations of type Token.

Finally, there is the dependency of using things like "v1:tok3", which is really an instance of a dependency of one annotation on another.

Incidentally, I thought this notation was explained in the LIF specifications, but where I thought is was it isn't. It may be buried somewhere else but it should go to a somewhat more prominent spot.

marcverhagen commented 7 years ago

Ah, I did find the thing on view identifiers and annotation identifiers. It is fairly prominently mentioned as general principle 2 in the view section of http://wiki.lappsgrid.org/interchange/overview.html. The colon is not specifically mentioned though.

ksuderman commented 7 years ago

What does it mean for one view to "dependOn" another view? If annotation X depends on annotation T in view 'v1' and annotation X' depends on annotations T' in 'v2' how do we express that in the metadata? I also see problems when copying annotations to a new view. Suppose we have:

{
    dependsOn: "v1",
    contains: {
        T1: {
            producer: ...
            type: ...
        },
        T2: {
            producer: ...
            type: ...
        }
}

If I copy the 'T1' annotations to a new view does the new view depend on "v1"?

I think the "dependsOn" field has to go inside the 'contains" section for the particular annotation type as it is the annotations that have dependencies on other annotations. The question is then, what fields do we need included in the "dependsOn" field? Obviously the view ID, but maybe a list of annotation types as well.

view {
    metadata {
        contains {
            T1 {
                producer: ...
                type: ...
                dependsOn {
                    view: 'v1',
                    type: 'http://vocab.lappsgrid.org/T2'
                }
            }
        }
    }
}

ksuderman commented 7 years ago

As I mentioned in lapps/org.lappsgrid.serialization#32 the Containsobject allows arbitrary fields (with special helpers for the officially defined fields) so we can start using the above with no code modifications.

keighrim commented 7 years ago

As the dependOn issue is resolved, can we close this issue, or should we continue using this thread for timestamp metadata?

ksuderman commented 7 years ago

Is dependsOn resolved?

keighrim commented 7 years ago

Oh, I thought we concluded to put dependsOn in the contains metadata.

marcverhagen commented 7 years ago

Not as far as I know.

marcverhagen commented 7 years ago

Quoting ksuderman from a few comments above:

What does it mean for one view to "dependOn" another view?

I simply saw this as a quick way for other services to find out that some annotations in view X may depend on annotations in view Y. That information by the way is already implicit in the annotations in X since if they refer to targets it would be using references like "Y:tok03".

Having said that, I think that Keith's suggestions achieves the same thing that the view-view dependencies do in addition to being a bit more precise and possibly closer to what WebLicht would like. The value of dependsOn would probably need to be a list though:

"views" : [
   { "id": "v1,
      "metadata": { 
         "contains" : [...] }
    },
   { "id": "v2,
      "metadata": {
          "contains" : [
               "T1": {
                   "producer": ...,
                   "type": ...,
                   "dependsOn": [
                       { "view": "v1", "type": "http://vocab.lappsgrid.org/T2" }
                       { "view": "v1", "type": "http://vocab.lappsgrid.org/T3" } ]
               }
          }
      }
   }
]

This would of course need to be duplicated in case we have T1a and T1b and both are dependent on v1:T2 and v1:t3.

marcverhagen commented 7 years ago

And to answer another question:

If I copy the 'T1' annotations to a new view does the new view depend on "v1"?

Yes. And the answer holds in both cases (dependsOn inside and outside of contains).

marcverhagen commented 6 years ago

Continuing...

Rereading this thread it looks like I came around to Keith's suggestion of having dependsOn inside of contains and having it reference both the view and the annotation type.

But I may disagree with myself on the previous message here on what dependsOn actually means. It is weird for it to means that it was the service that depends on other information. Mayhap a copiedFrom feature may make sense.

This ties in to a separate question on whether a service that requires tokens and copies them to the newly created view also should say it produces tokens.

marcverhagen commented 6 years ago

I changed my mind, again, and I now live in a blissful state of thinking that it does not matter.

Say some annotations refer to an annotation in another view: just select the view and find the annotation.
Say some POS tagger also creates the Token annotations: it will just say so.
Say some POS tagger requires Tokens and copies them into the view it creates. It could still just claim it created them and add Token to the contains map and if it wants it could say that the rules attribute is set to "CopiedFromInput".

This blissful balloon could be popped any time, but on of the advantages of this approach is that little work is required.

ksuderman commented 6 years ago

It is weird for it to means that it was the service that depends on other information.

It doesn't mean that though. The contains section of the metadata is a map; the annotation type is the key and another map of metadata is the value. So the dependsOn data is metadata about that annotation type.

If some annotations refer to another view we likely want to be able to determine that simply by looking at the metadata.
Yes
A service shouldn't claim to create an annotation type if it only copies them from another view as it did not create the annotations. If a service does copy annotations from another view it should also copy the relevant metadata (contains) from the view as well.

marcverhagen commented 6 years ago

The contains section of the metadata is a map; the annotation type is the key and another map of metadata is the value. So the dependsOn data is metadata about that annotation type.

Yes, that makes sense. And I assume we have a convention that says we do not need dependsOn if the annotation depends on another annotation in the same view.

I can live with that
Yes
Okay, so we will have the validator test for this. A tagger that wants Token and produces Token#pos would need to have the metadata from the Token view copied.

And do we agree on the value of dependOn? The last proposal was

"dependsOn": [
     { "view": "v1", "type": "http://vocab.lappsgrid.org/T2" },
     { "view": "v1", "type": "http://vocab.lappsgrid.org/T3" } ]

For the schema this just means it takes an array, the LIF specifications can be more verbose on this.

ksuderman commented 6 years ago

I think that is all we need for dependsOn for now, unless someone else can think of information that might be useful. We might want to make the type field allow an array, but that is an implementation detail.

Once we have settled on the model it shouldn't be too difficult to extend the schema to validate the dependsOn field.

marcverhagen commented 6 years ago

I vote to keep the type a string, the example above already allows dependence on two types in another view.

keighrim commented 5 years ago

Being implemented in lapps/org.lappsgrid.serialization#48 .

keighrim commented 5 years ago

Implemented via lapps/org.lappsgrid.serialization#48 and lapps/org.lappsgrid.serialization#32.

keighrim commented 5 years ago

Still needs to be added to LIF schema to support these field.

keighrim commented 5 years ago

Done.

lapps / vocabulary-pages

Adding metadata features to views #50