openminted / Open-Call-Discussions

A central place for participants in the open calls to ask questions
2 stars 1 forks source link

My system returns a Gate document #4

Closed abravo84 closed 6 years ago

abravo84 commented 6 years ago

Hello! We have developed a system to process scientific articles in XML format, generating enriched GATE documents (as mentioned in our proposal).

We have uploaded 5 sample files in a corpus on the Test Platform (https://test.openminted.eu/landingPage/corpus/cb6b89a1-1e4f-4ba8-b790-9ce74e5d8a08) as examples of input files. Each input file is processed by our system and then a GATE document with multiple annotations is generated (an example is attached in this "Issue").

What's the best way to integrate our system? Should we transform our GATE document into a UIMA XMI file?

Thank you so much!

àlex. journal.pone.0194749.xml.zip

greenwoodma commented 6 years ago

The answer is both yes and no.

The platform fully supports running GATE components, and it is quite acceptable to produce GATE XML documents as output, so in theory if your components meet the OpenMinTeD specifications (mostly ensuring correct metadata etc.) then it should be possible to register them with the platform and use them within a workflow (either by registering each GATE component individually and building a workflow withing OpenMinTeD or registering a docker image containing your complete GATE application).

The issue of XMI only arises if you want to make it possible for your components to be used within workflows containing other, none GATE based components, or you want the annotation viewer in OpenMinTeD to be able to display the output files. In these cases then you would need to convert the GATE XML documents to XMI. This means that initially you should be able to register and test your components without worrying about producing XMI and when that works only then consider XMI.

I've actually been working on a component that should make converting from GATE XML to XMI easier. It will be a GATE component made available through the OpenMinTeD platform that you would add to a workflow at the point you want to convert from GATE XML to XMI, and you would configure it with a mapping file that describes how to convert each GATE annotation type to a type in the XMI. The component is based around the existing UIMA support in GATE, and I've run a few successful tests but it needs cleaning up and documenting before general release. Hopefully that shouldn't take too long and so should be available once you have managed to register and run your components on the platform.

abravo84 commented 6 years ago

Thank you so much! Currently, our system is a JAR file (~200Mb with dependencies), could we register the JAR file on the Test Platform?

greenwoodma commented 6 years ago

Is the component available as a maven artirfact on Maven central? If so then you should be able to register it with the platform by providing the maven coordinates. Publishing via maven is our recommended approach to distribution but there might also be a manual option on the registry, but you'd have to provide the OMTD-SHARE XML file(s) and the JAR and I'm not sure how you'd upload a 200MB file through a web form.

abravo84 commented 6 years ago

Ok! Thank you Mark! I will try to register my artifact! :)

abravo84 commented 6 years ago

Hi @greenwoodma ! Regarding to my approach, I have databases and config files in my JAR and I am thinking about create a docker including my resources and a JAR with all dependencies. In this case the input and output of my docker will be folders, specifically, the input folder contains XML files, and the output folder contains the generated gate documents. Will this approach be compatible with the Openminted platform???

greenwoodma commented 6 years ago

I believe it should be yes, although obviously this approach involves more work for you as you'll need to build the docker image to match the OpenMinTeD guidelines rather than having it generated automatically from your GATE plugin. It does allow you a greater degree of flexibility though so given the resources etc. you want to include this is probably the best way to go.

You can find the docker guidelines here: https://guidelines.openminted.eu/sharing-components-as-dockerised-images.html

You will of course need some way of running your GATE application from the command line. I'm not sure if you already have that, but if not let me know and we'll try and help you out -- the standard executor we use when we auto generate docker images for GATE components might just work even in your situation.