Closed Aequivinius closed 6 years ago
Is the non-XML preface in the XMI file a Unicode BOM (Byte Order Marker)? In theory the files should be UTF-8 which I don't believe requires a BOM, but I know we've had a problem in GATE before (outside of OpenMinTeD) where XML files from odd sources had a BOM prefix.
If it helps then the code we use in GATE to ensure we always discard the BOM can be found at https://github.com/GateNLP/gate-core/blob/master/src/main/java/gate/util/BomStrippingInputStreamReader.java
To make sure we are as well prepared as possible to help during the hackathon sessions could you please add/attach to this issue:
Dear @greenwoodma:
For some reason the code that is generating Galaxy XML wrappers didn't work as expected. The typesystem you provided was not copied. I do not know why... @nguyennth and I have registered Manchester's web service many times without problems.
So, I deleted your record and re-registered it. Here it the new landing page. https://test.openminted.eu/landingPage/application/OGERWS Wrapper was generated correctly.
Then used the registered app to process the thalamus corpus.
Finished .... :-) :-) :-)
Output is here https://test.openminted.eu/landingPage/corpus/7691bf1a-283d-43bc-9653-26f482476264 and here 6ef31b96-675d-4078-88fa-ddecd7ad1a77.zip
Please check it. I do not see any NER annotations. What we should expect? Probably it has to do with the typesystem you provided mvn:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.0
Maybe we need some help by University Of Manchester that developed the web service spec. for OMTD @nguyennth or @reckart that knows everything about DKPro.
The typesystem is required from the web service client to serialize the results. If it is not there the respective annotations will not included in the output.
Yeah, this is the issue we're currently investigating, and which we were hoping to discuss during the Hackathon.
OGER sends NER annotations, but OMTD doesn't seem to care for them when it re-parses our results. I'm actually a bit at a loss as for what sort of typesystem we should provide and how so. We have this file ready on our server (typesystem.xml.zip), which I would've expected to provide the necessary information. However, OMTD never sends a request for this file.
If you have any more information on what sort of typesystem file precisely we need to add where, that would be greatly appreciated.
Please see this one as an example. https://mvnrepository.com/artifact/uk.ac.nactem.uima/NeuroscienceTypeSystem/0.2 You can download the jar see the its structure and contents. @nguyennth can provide some more info I think.
@Aequivinius There is a minor semantical error in your metadata. Your component takes as input a whole corpus of documents, not a single document, and generated annotations for the corpus, thus an annotated corpus. Correct? If that's the case, please change the processingResourceType from document to corpus in both inputContentResourceInfo and outputResourceInfo, in the final version of your metadata.
@gkirtzou Done
@galanisd | @nguyennth I have a few questions:
`
The NeuroScience maven artifact was registered as follows:
<ns0:resourceIdentifiers> <ns0:resourceIdentifier resourceIdentifierSchemeName="maven">mvn:uk.ac.nactem.uima:NeuroscienceTypeSystem:0.2</ns0:resourceIdentifier> </ns0:resourceIdentifiers>
It seems identical to yours. The web service executor that I created downloads this artifact and adds it to its classpath...For contents and structure you should ask @nguyennth .
Does anyone know why this https://test.openminted.eu/landingPage/application/OGERWS has disappeared?
It was deleted by someone? There is a new landing page?
I noticed it, too, currently using this ( https://test.openminted.eu/landingPage/application/b8fb9bbd-603c-4b53-b86d-15c6c753302d). It is set to private so I can easily play around with different typesystems, but I can set it to public if you need me to.
On Tue, Apr 17, 2018 at 5:07 PM, Dimitrios Galanis <notifications@github.com
wrote:
Does anyone know why this https://test.openminted.eu/landingPage/application/OGERWS has disappeared?
It was deleted by someone? There is a new landing page?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openminted/Open-Call-Discussions/issues/34#issuecomment-382027640, or mute the thread https://github.com/notifications/unsubscribe-auth/AK6JaJRYxRzSvvgAeCPaYCdbmS4WQE3Rks5tpgVKgaJpZM4TTjPa .
I am sure that I didn't delete it @antleb Any ideas?
I've tried now these maven coordinates in the omtd-share.xml, which seem correct:
mvn:de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.1
This should point to this repository, if I'm not mistaken, which includes the necessary info.
However, our namedEntity annotations are still missing from OMTD.
You are expecting things like this?
<type2:NamedEntity xmi:id="22347" sofa="56913" begin="219" end="225" identifier="A4FV52"/><type2:NamedEntity xmi:id="22353" sofa="56913" begin="230" end="236" identifier="A6QLI1"/><type2:NamedEntity xmi:id="22359" sofa="56913" begin="263" end="272" identifier="CHEBI:14321"/><type2:NamedEntity xmi:id="22365" sofa="56913" begin="273" end="279" identifier="GO:0098657"/><type2:NamedEntity xmi:id="22371" sofa="56913" begin="488" end="497" identifier="CHEBI:14321"/>
@courado @antleb ?
I re-registered your app.
https://test.openminted.eu/landingPage/application/b8fb9bbd-603c-4b53-b86d-15c6c753302d
and processed the thalamus corpus.
Output here: https://test.openminted.eu/landingPage/corpus/ba172d04-96dc-4007-b9ae-020460691e19 and here: 12d3dce1-996b-4c2a-8324-74a951f2f7c4.zip
I hope that is not an illusion...
@Aequivinius welcome to OpenMinTeD.
Hi,
Sorry for my late reply. As far as I understand it seems that you're using an available type system that was already uploaded to Maven central, i.e., the ner type system by dkpro. This means that you don't need to create a new type system. You only need to include the type system as a dependency in pom of the web service project. As @galanisd showed above, I believe it works now.
In the case that you need to create a new type system, please let me know, we can discuss details later.
@galanisd Fascinating, this is precisely what we were after. Wonder if the re-registering did the trick? Anyway, this is what we wanted, so it seems all is well! Thanks for your help!
Should we now proceed to register the service on services.openminted.eu?
Should we now proceed to register the service on services.openminted.eu?
Not yet. services.openminted.eu has not been updated for quite some time. You will be notified.
Thanks!
Dimitris
@Aequivinius I was taking a final look into your metadata (as the one registered here ) and I noticed that you had declared in your input that the annotation type is Name Entity (i.e. http://w3id.org/meta-share/omtd-share/NamedEntity). Semantically, that means that your input needs to be annotated at that level before using your application. Is that the case? If not, and your input is just a raw corpus, then I would suggest removing the annotation type in the inputContentResourceInfo section.
Also I would like to ask for statistical reasons, whether you performed the registration via the registration form or via xml?
This is a mistake, I'll remove it from the XML and upload it correctly next time (the registration form doesn't let me delete the value for this specific field once set). I mostly used the web registration form, only occasionally tinkering with the XML.
On Wed, Apr 18, 2018 at 9:45 AM, Katerina Gkirtzou <notifications@github.com
wrote:
@Aequivinius https://github.com/Aequivinius I was taking a final look into your metadata (as the one registered here https://test.openminted.eu/landingPage/application/b8fb9bbd-603c-4b53-b86d-15c6c753302d ) and I noticed that you had declared in your input that the annotation type is Name Entity (i.e. http://w3id.org/meta-share/ omtd-share/NamedEntity). Semantically, that means that your input needs to be annotated at that level before using your application. Is that the case? If not, and your input is just a raw corpus, then I would suggest removing the annotation type in the inputContentResourceInfo section.
Also I would like to ask for statistical reasons, you whether you performed the registration via the registration form or via xml?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openminted/Open-Call-Discussions/issues/34#issuecomment-382295456, or mute the thread https://github.com/notifications/unsubscribe-auth/AK6JaCHncRyxYLyDi5mPE2wS7DGphbsYks5tpu8WgaJpZM4TTjPa .
@Aequivinius I didn't know that the registration form didn't allow you to delete specific fields once set. I will report this bug to the responsible technical person. Thanks for sharing!
Also, when you do the last changes in the OMTD-SHARE descriptor could you please uploaded here as well to have a final check? In case I missed anything :)
@gkirtzou Here you go! 18-4-removed_input.xml.zip
The metadata seems fine. I would only suggest two things
Othewise, the metadata are correct and your application is also tested. It only rests the final registeration to the platform, when @greenwoodma informs you.
@gkirtzou Thank you for you help! Find attached the most recent version of our share descriptor. 20-4.xml.zip
@Aequivinius Perfect! I have no further comments/recommendations.
@Aequivinius You can now proceed to the final uploading of your application at services.openminted.eu. If you encounter any problems, please let us know. Thanks!
@Aequivinius My mistake, please refrain from uploading at services.openminted.eu until further notice.
@Aequivinius I have taken the liberty to upload your application at services.openminted.eu and tested it. It seems to work ok. The application is available at: https://services.openminted.eu/landingPage/application/71345d18-297f-4ac5-b4de-38ef3cacbe75 You can also test it yourself. If everything is ok, let me know so that we close the issue.
Perfect, thanks!
@Aequivinius I have a question; in your proposal and the description of the application, you mention the Bio Term Hub, and I'm trying to understand the relation between the two. When you say that the OGER is built on top of the BTH, you mean that you use the terminologies from the reference databases? And this aggregation of terminologies is already in the docker image you have provided? Or should we expect another component/application?
@pennyl67 No, there will be no further components or applications.
BTH is an aggregator of terminologies and produces a unified terminology. The terminology created in this way can be used by OGER. However, the two components can also be used independently. The term list provided by BTH could be used for other purposes; and OGER can be provided with a term list obtained from other sources.
We submitted OGER as a web service as an application to OMTD. This web service uses BTH to obtain up to date terminologies in the background.
Furthermore, we also wanted to make BTH available to the public, so we created a Docker image that allows researchers can run it locally. Alternatively, they may use our own webservice at https://pub.cl.uzh.ch/projects/ontogene/biotermhub/. However, BTH uses a web interface in which desired resources are manually selected. Because of that, it was not suited to be integrated into the OMTD platform, which is why we provide a separate link for the research community where they can download a Dockerized version of BTH (https://github.com/OntoGene/BioTermHub_dockerized).
Kind regards,
On 15.5.2018 17:06, Penny Labropoulou wrote:
@Aequivinius [1] I have a question; in your proposal and the description of the application, you mention the Bio Term Hub, and I'm trying to understand the relation between the two. When you say that the OGER is built on top of the BTH, you mean that you use the terminologies from the reference databases? And this aggregation of terminologies is already in the docker image you have provided? Or should we expect another component/application?
-- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [2], or mute the thread [3].
*
Links:
[1] https://github.com/Aequivinius [2] https://github.com/openminted/Open-Call-Discussions/issues/34#issuecomment-389201318 [3] https://github.com/notifications/unsubscribe-auth/AK6JaLssZOakEfaVe8-VyAC8_awEWu2Wks5tyu7hgaJpZM4TTjPa
Thanks for the explanations. It's clear now!
Given that your application is already uploaded and public in the platform, if you agree, I will close this issue.
Dear organisers
We're preparing our submission of OGER, a dictionary-based entity recogniser, as a webservice for openminted. We're currently in the process of fixing a few remaining issues that relate to how we parse the XMI that we receive from openminted. As it currently stands, it looks like the payload of the requests includes some non-XML preface, which we need to cut in order to parse the document to be annotated. Would you have a sample of how OMTD constructs the requests payload?
As for the hackathon, would it be possible to find a time on Tuesday afternoon? Most people from our group can make it then. Apart from that, Thursday or Friday would suit us, too.
Thanks for your help & kind regards,