Closed keighrim closed 5 years ago
Currently we have "unmarked" json schema files, that confusingly pretend to be official or LATEST version of the schemata.
We talked about possible removal of those unmarked files in the root directory.
As in 929e5d2, default schema address was hard corded to http://vocab.lappsgrid.org/schema/datasource-schema-1.0.0.json . Those files with -1.0.0
suffix are there in the root directory, but if we decide to only use version specifying directoreis (./1.0.0/xxx-schema.json
, ./1.1.0/xxx-schema.json
, ...), old tools that point to the old default location will break. I'd like to suggest using version suffices only instead of versioned directories.
.
|-- container-schema-1.0.0.json
|-- container-schema-1.1.0.json
|-- datasource-schema-1.0.0.json
|-- datasource-schema-1.1.0.json
|-- lif-schema-1.0.0.json
|-- lif-schema-1.1.0.json
|-- service-schema-1.0.0.json
|-- service-schema-1.1.0.json
`-- ...
On adding tagSet
to IOSpecification
object, this can be a reasonable way to handle the issue.
For example,
"produces" : {
"annotations" : [ "TOKEN" , "NAMED_ENTITY" ] , [1]
"language" : "some_language" ,
"encoding" : "some_encoding" ,
"tagSets" : {
"TOKEN" : "tags_pos_upenn", [2]
"NAMED_ENTITY" : "tags_ner_stanford"
}
}
Note that discriminators on line [1] and [2] all need to be expanded to full URIs (using java libraries, preferably).
Starting a PR just with adding the
licenseDesc
field. AddingtagSet
is also mentioned in #9, but I'm not sure how it should be added. For example, when a tool "produces"TOKEN
andDEPENDENCY_STR
, it would be having twotagSet
for each annotation type. Should metadata list them in random order, or do we want some sort of mapping between annotation types and their tagsets?