lapps / org.lappsgrid.metadata

Classes used to read and write LAPPS metadata.
Apache License 2.0
0 stars 0 forks source link

#9 update for 1.1.0 schema #10

Closed keighrim closed 5 years ago

keighrim commented 6 years ago

Starting a PR just with adding the licenseDesc field. Adding tagSet is also mentioned in #9, but I'm not sure how it should be added. For example, when a tool "produces" TOKEN and DEPENDENCY_STR, it would be having two tagSet for each annotation type. Should metadata list them in random order, or do we want some sort of mapping between annotation types and their tagsets?

keighrim commented 6 years ago

Currently we have "unmarked" json schema files, that confusingly pretend to be official or LATEST version of the schemata.

We talked about possible removal of those unmarked files in the root directory.

keighrim commented 6 years ago

As in 929e5d2, default schema address was hard corded to http://vocab.lappsgrid.org/schema/datasource-schema-1.0.0.json . Those files with -1.0.0 suffix are there in the root directory, but if we decide to only use version specifying directoreis (./1.0.0/xxx-schema.json, ./1.1.0/xxx-schema.json, ...), old tools that point to the old default location will break. I'd like to suggest using version suffices only instead of versioned directories.

.
|-- container-schema-1.0.0.json
|-- container-schema-1.1.0.json
|-- datasource-schema-1.0.0.json
|-- datasource-schema-1.1.0.json
|-- lif-schema-1.0.0.json
|-- lif-schema-1.1.0.json
|-- service-schema-1.0.0.json
|-- service-schema-1.1.0.json
`-- ... 
keighrim commented 6 years ago

Also, note that these changes in DEFAULT_SCHEMA_URL will effect org.lappsgrid.annotations API as well. See lapps/org.lappsgrid.annotations#7) And how it's used in the annotation API; 1, 2, 3, 4

keighrim commented 5 years ago

On adding tagSet to IOSpecification object, this can be a reasonable way to handle the issue. For example,

"produces" : {
  "annotations" : [ "TOKEN" , "NAMED_ENTITY" ] ,          [1]
  "language" : "some_language" , 
  "encoding" : "some_encoding" , 
  "tagSets" : {
    "TOKEN" : "tags_pos_upenn",                           [2]
    "NAMED_ENTITY" : "tags_ner_stanford"
  }
}

Note that discriminators on line [1] and [2] all need to be expanded to full URIs (using java libraries, preferably).