Closed marcverhagen closed 5 years ago
My two cent: I totally agree on the need for having dependOn
field. My one concern is whether we have to provide a guided (or forced) syntactic notation for back-referencing a previous view in a later view.
Currently, in the examples on the lapps wiki, we are using colons.
{ "@type": "Dependency",
"label": "ROOT",
"id": "dep0",
"features": {
"governor": null,
"dependent": "v1:tok1" }},
But I don't think we have explicitly made colons to be an exclusive (or recommended) way to write such relations, and I'm not sure we can force such a notation in JSON(-LD) world.
We need to think about this a bit more. I had imagined that dependsOn
is a relation between views, so view X could say information in there actually depended on View Y. The individual annotation itself can be completely independent of the other annotation (it only refers to character offsets).
Having this be a relation between views is what I had in mind when putting up the LIF example above. We could go a little bit fancier (as suggested by Keith):
"dependsOn" : [
{ "view": "v1", "type": "Token" },
{ "view": "v2", "type": "NamedEntity"}
]
I think we also need to come up with a straight story on why having dependsOn
would be a good thing to have.
On the other hand, we could also view dependsOn
as a relation between annotations, as in annotations of type Sentence
depend on annotations of type Token
.
Finally, there is the dependency of using things like "v1:tok3", which is really an instance of a dependency of one annotation on another.
Incidentally, I thought this notation was explained in the LIF specifications, but where I thought is was it isn't. It may be buried somewhere else but it should go to a somewhat more prominent spot.
Ah, I did find the thing on view identifiers and annotation identifiers. It is fairly prominently mentioned as general principle 2 in the view section of http://wiki.lappsgrid.org/interchange/overview.html. The colon is not specifically mentioned though.
What does it mean for one view to "dependOn" another view? If annotation X depends on annotation T in view 'v1' and annotation X' depends on annotations T' in 'v2' how do we express that in the metadata? I also see problems when copying annotations to a new view. Suppose we have:
{
dependsOn: "v1",
contains: {
T1: {
producer: ...
type: ...
},
T2: {
producer: ...
type: ...
}
}
If I copy the 'T1' annotations to a new view does the new view depend on "v1"?
I think the "dependsOn" field has to go inside the 'contains" section for the particular annotation type as it is the annotations that have dependencies on other annotations. The question is then, what fields do we need included in the "dependsOn" field? Obviously the view ID, but maybe a list of annotation types as well.
view {
metadata {
contains {
T1 {
producer: ...
type: ...
dependsOn {
view: 'v1',
type: 'http://vocab.lappsgrid.org/T2'
}
}
}
}
}
As I mentioned in lapps/org.lappsgrid.serialization#32 the Contains
object allows arbitrary fields (with special helpers for the officially defined fields) so we can start using the above with no code modifications.
As the dependOn
issue is resolved, can we close this issue, or should we continue using this thread for timestamp
metadata?
Is dependsOn
resolved?
Oh, I thought we concluded to put dependsOn
in the contains
metadata.
Not as far as I know.
Quoting ksuderman from a few comments above:
What does it mean for one view to "dependOn" another view?
I simply saw this as a quick way for other services to find out that some annotations in view X may depend on annotations in view Y. That information by the way is already implicit in the annotations in X since if they refer to targets it would be using references like "Y:tok03".
Having said that, I think that Keith's suggestions achieves the same thing that the view-view dependencies do in addition to being a bit more precise and possibly closer to what WebLicht would like. The value of dependsOn would probably need to be a list though:
"views" : [
{ "id": "v1,
"metadata": {
"contains" : [...] }
},
{ "id": "v2,
"metadata": {
"contains" : [
"T1": {
"producer": ...,
"type": ...,
"dependsOn": [
{ "view": "v1", "type": "http://vocab.lappsgrid.org/T2" }
{ "view": "v1", "type": "http://vocab.lappsgrid.org/T3" } ]
}
}
}
}
]
This would of course need to be duplicated in case we have T1a and T1b and both are dependent on v1:T2 and v1:t3.
And to answer another question:
If I copy the 'T1' annotations to a new view does the new view depend on "v1"?
Yes. And the answer holds in both cases (dependsOn
inside and outside of contains
).
Continuing...
Rereading this thread it looks like I came around to Keith's suggestion of having dependsOn
inside of contains
and having it reference both the view and the annotation type.
But I may disagree with myself on the previous message here on what dependsOn
actually means. It is weird for it to means that it was the service that depends on other information. Mayhap a copiedFrom
feature may make sense.
This ties in to a separate question on whether a service that requires tokens and copies them to the newly created view also should say it produces tokens.
I changed my mind, again, and I now live in a blissful state of thinking that it does not matter.
contains
map and if it wants it could say that the rules
attribute is set to "CopiedFromInput".This blissful balloon could be popped any time, but on of the advantages of this approach is that little work is required.
It is weird for it to means that it was the service that depends on other information.
It doesn't mean that though. The contains
section of the metadata is a map; the annotation type is the key and another map of metadata is the value. So the dependsOn
data is metadata about that annotation type.
The contains section of the metadata is a map; the annotation type is the key and another map of metadata is the value. So the
dependsOn
data is metadata about that annotation type.
Yes, that makes sense. And I assume we have a convention that says we do not need dependsOn
if the annotation depends on another annotation in the same view.
Token
and produces Token#pos
would need to have the metadata from the Token view copied.And do we agree on the value of dependOn
? The last proposal was
"dependsOn": [
{ "view": "v1", "type": "http://vocab.lappsgrid.org/T2" },
{ "view": "v1", "type": "http://vocab.lappsgrid.org/T3" } ]
For the schema this just means it takes an array, the LIF specifications can be more verbose on this.
I think that is all we need for dependsOn
for now, unless someone else can think of information that might be useful. We might want to make the type
field allow an array, but that is an implementation detail.
Once we have settled on the model it shouldn't be too difficult to extend the schema to validate the dependsOn
field.
I vote to keep the type a string, the example above already allows dependence on two types in another view.
Being implemented in lapps/org.lappsgrid.serialization#48 .
Implemented via lapps/org.lappsgrid.serialization#48 and lapps/org.lappsgrid.serialization#32.
Still needs to be added to LIF schema to support these field.
Done.
We have an open issue in the LIF pages where we suggest we might add a
dependsOn
feature to a view's metadata:This should probably be a optional and list-valued feature.
We also mention adding a
timestamp
feature.