openminted / omtd-share-annotations

Java annotations and a Maven plugin to automatically generate OMTD-SHARE metadata.
https://openminted.github.io/releases/omtd-share-annotations/
0 stars 1 forks source link

DataFormatInfo no longer has mime type - where did it go? #32

Open reckart opened 6 years ago

reckart commented 6 years ago

DataFormatInfo no longer has a mime type info (and file extension) in OMTD-SHARE 3.0.2.

@pennyl67 where did they go?

pennyl67 commented 6 years ago

data formats in the ontology have a property "hasMimetype" which links a format to the corresponding mimetype. Obviously only for those that have a mimetype - broad concepts such as "corpus format" do not have a mimetype. Same thing for file extension and documentation url. Again, if it helps, I can send you the equivalence relations between data format and mimetype. In fact, it's already at a googlesheet (intended for checking purposes): https://docs.google.com/spreadsheets/d/1Xs3-RlwyJdrCvMIJsOkOuOOh7EXkLdUHknj1I2w04Wo/edit?usp=sharing. But the googlesheet has the IRI and not the label. We can add the label of the mimetype which would help you. If you want this info in another format, let me know.

reckart commented 6 years ago

Ok, then I'll remove the code regarding mimetypes from the OMTD Maven Plugin.

pennyl67 commented 6 years ago

Does this mean that data format will have to be entered manually?

reckart commented 6 years ago

Rather... well... what about data formats which are not in the ontology (i.e. otherFormat). Shouldn't in be possible to specify such information as mimetype and file extension at least for these?

greenwoodma commented 6 years ago

hmm, I'm confused as I thought when I migrated the maven plugin code to the latest model version I updated some things around mimetypes, although I guess I don't know if the info is used in any of the examples I've tested on so far, and hence if it ends up in the right place

reckart commented 6 years ago

@greenwoodma as far as I can see, you just commented out the stuff and in some cases added a "todo" comment.

greenwoodma commented 6 years ago

@reckart was just looking at the code and certainly UimaDescriptorAnalyzer adds mimetype info, see line 263 onwards

pennyl67 commented 6 years ago

I find it a pity not to have already some mapping from mimetype to data format when it's known - so, if the googlesheet can be used for the mappings, pls let's do; just tell me how I can help. For other data formats (as for all the ontology-driven elements), the idea is that you use the dataFormat to specify a broader concept and then in the dataFormatOther (free text) you add the new suggested value, wich should be monitored by the ontology curators.

reckart commented 6 years ago

@greenwoodma no, it doesn't - that code is ineffective. It tries to look up the data format in the controlled vocabulary using the mime type.

Data format identifier example: "http://w3id.org/meta-share/omtd-share/Conll2000"

Mime type example: "text/tab-separated-values"

It will obviously never match.

reckart commented 6 years ago

@pennyl67 the new format could however, have a different mime type and file extension...

reckart commented 6 years ago

@pennyl67 @greenwoodma I am presently working on the code, seeing how I can add a UIMA-type -> OMTD-SHARE type mapping. Once I worked that out, I might also add something like this for mime types.

greenwoodma commented 6 years ago

@reckart ah, sorry. I clearly misunderstood how things had changed between the two model versions

pennyl67 commented 6 years ago

@reckart yes, indeed for new data formats we need more info than just a name (a documentation url, at least! to me that's more important than just a mimetype, if it's not a standard mimetype). But finish with types and then we discuss this - I' m also putting a note for the discussion on the ontology curation.

reckart commented 6 years ago

I find it a pity not to have already some mapping from mimetype to data format when it's known - so, if the googlesheet can be used for the mappings, pls let's do; just tell me how I can help. For other data formats (as for all the ontology-driven elements), the idea is that you use the dataFormat to specify a broader concept and then in the dataFormatOther (free text) you add the new suggested value, wich should be monitored by the ontology curators.

MIME type mapping has been added: https://github.com/openminted/omtd-share-annotations/issues/34